[PDF] Interval Analysis of Worst-case Stationary Moments for Stochastic Chemical Reactions with Uncertain Parameters

Abstract

The dynamics of cellular chemical reactions are variable due to stochastic noise from intrinsic and extrinsic sources. The intrinsic noise is the intracellular fluctuations of molecular copy numbers caused by the probabilistic encounter of molecules and is modeled by the chemical master equation. The extrinsic noise, on the other hand, represents the intercellular variation of the kinetic parameters due to the variation of global factors affecting gene expression. The objective of this paper is to propose a computational framework to analyze the combined effect of the intrinsic and the extrinsic noise modeled by the chemical master equation with uncertain parameters. More specifically, we formulate a convex optimization problem to compute the intervals of the stationary solution of uncertain moment equations whose parameters are given only partially in the form of the statistics of their distributions. The optimization program is derived without approximating the governing equation in contrast with many existing approaches. Thus, we can obtain guaranteed intervals of the worst possible values of the moments for all parameter distributions satisfying the given statistics, which is useful for model-based rational engineering of biomolecular circuits in synthetic biology, where the robustness of synthetic reactions is important. We demonstrate the proposed optimization approach using two examples of stochastic chemical reactions and show that the solution of the optimization problem gives practically useful upper and lower bounds of the statistics of the stationary copy number distributions.

Full PDF

aa r X i v : . [ ee ss . S Y ] F e b Interval Analysis of Worst-case Stationary Moments for Stochastic ChemicalReactions with Uncertain Parameters

Yuta Sakurai a) and Yutaka Hori b) Department of Applied Physics and Physico-Informatics,Keio University3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Kanagawa 223-8522,Japan

The dynamics of cellular chemical reactions are variable due to stochastic noise fromintrinsic and extrinsic sources. The intrinsic noise is the intracellular ﬂuctuations ofmolecular copy numbers caused by the probabilistic encounter of molecules and ismodeled by the chemical master equation. The extrinsic noise, on the other hand,represents the intercellular variation of the kinetic parameters due to the variationof global factors aﬀecting gene expression. The objective of this paper is to proposea computational framework to analyze the combined eﬀect of the intrinsic and theextrinsic noise modeled by the chemical master equation with uncertain parameters.More speciﬁcally, we formulate a convex optimization problem to compute the inter-vals of the stationary solution of uncertain moment equations whose parameters aregiven only partially in the form of the statistics of their distributions. The optimiza-tion program is derived without approximating the governing equation in contrastwith many existing approaches. Thus, we can obtain guaranteed intervals of theworst possible values of the moments for all parameter distributions satisfying thegiven statistics, which is useful for model-based rational engineering of biomolecularcircuits in synthetic biology, where the robustness of synthetic reactions is impor-tant. We demonstrate the proposed optimization approach using two examples ofstochastic chemical reactions and show that the solution of the optimization problemgives practically useful upper and lower bounds of the statistics of the stationarycopy number distributions. a) Electronic mail: [email protected] b) Electronic mail: [email protected] . INTRODUCTION The dynamics of chemical reactions in cells are variable even when the cell populationis genetically identical. The stochastic response of the individual cells allows the cell pop-ulation to maintain diverse phenotypes in a homogeneous culture environement . Themechanism of the stochastic response is explained using the framework of two types of noisecalled intrinsic and extrinsic noise . The intrinsic noise is the intracellular ﬂuctuations ofmolecular copy numbers caused by the probabilistic encounter of molecular species such asmRNA and proteins in a single cell, whose typical copy numbers are on the order of 10 to10 . The extrinsic noise, on the other hand, arises from the intercellular variation of theglobal factors aﬀecting gene expression such as RNA polymerase and ribosome copy num-bers possibly due to the cell cycle and plasmid copy numbers, for example. Thus, it resultsin the cell-to-cell variation of the kinetic parameters of the stochastic chemical reactions.The dynamics of the intrinsic noise is modeled by a continuous-time discrete state Markovprocess on a semi-inﬁnite integer lattice associated with the copy numbers of molecularspecies. The governing equation of the Markov process is called the chemical master equation(CME) , which is an inﬁnite dimensional linear ordinary diﬀerential equation. In manycases, however, the exact analytic solution of the CME is hard to obtain because of theinﬁnite dimensional equation. Thus, analyses of stochastic chemical reactions are carriedout based either on sample-path generation using the stochastic simulation algorithm or onapproximate models.To date, many diﬀerent approaches were proposed to approximately compute the statis-tics and/or the distribution of the copy numbers. The chemical Langevin equation and thelinear noise approximation are widely used approaches to approximately capture the meanand the variance of the copy number distribution. The moment closure approximation is yetanother major approach, which predicts the statistics of the copy number distributions basedon an approximated moment equation derived from the CME and some assumptions on theunderlying copy number distribution . The ﬁnite state projection (FSP), on the otherhand, enables a direct computation of the copy number distribution based on an approxi-mated CME . Unlike the aforementioned approaches, the FSP has the advantage thatthere is a mathematically rigorous way to quantify the error bound of the approximation,and thus, the accuracy of analysis results can be quantitatively discussed. More recently, a2oment computation method with guaranteed error bounds was proposed separately by sev-eral authors . This method uses truncated moment equations to quantify the momentsof the copy number distributions, but unlike the moment closure approximation , thevalues of the truncated moments are rigorously estimated based on matrix inequality con-ditions. In particular, the moment computation problem with these conditions boils downto a class of mathematical optimization called the generalized moment problem . Thus,using well-established solvers of mathematical optimization, the statistics of molecular copynumbers can be systematically and eﬃciently predicted with mathematically guaranteedintervals.Despite these advancements, one limitation of these general frameworks is that theyfocus only on the analysis of intrinsic noise while experimental observations suggest thatthe stochastic cellular response is the result of the combined eﬀects of intrinsic and extrinsicnoise . Thus, an important next step is to generalize these frameworks to enable thesimultaneous analysis of intrinsic and extrinsic noise. Toward this goal, a sample-pathgeneration algorithm was proposed by extending the stochastic simulation algorithm .The linear noise approximation and the method of moment closure were also extendedto analyze the interplay between the two sources of noise. Other works proposed eﬃcientsampling schemes of the kinetic parameters for simulations . However, there are fewgeneral frameworks that can guarantee the accuracy of approximation except the work by˘Ceska et al. , where the formal method was employed to guarantee the accuracy of theanalysis.Motivated by this situation, we here consider chemical reactions subject to both intrinsicand extrinsic noise and propose a theoretical framework for directly computing the guaran-teed intervals of the statistics of molecular copy number distributions without generatingsample paths. To be more speciﬁc, our goal is to obtain the upper and the lower bounds ofthe stationary moments of the copy number distributions when the rate parameters, whichappear in the propensity functions of the CME are heterogeneous within the cell popula-tion. Since exact identiﬁcation of the joint parameter distribution is hard in practice, wehere assume that only part of the statistics of parameter distribution such as the meanof the marginal distributions and/or the covariance of the parameters are available. Thisimplies that the full (joint) distribution of the parameters cannot be uniquely determined,and thus, the stationary moments of the copy number distribution can be obtained only3 arameter m o l e c u l a r c op y nu m be r uncertain parameter distribution dist FIG. 1. The concept of the worst-case interval of stationary moments for uncertain parameterdistributions as the worst-case intervals for all possible parameter distributions satisfying the a priori statistics (FIG. 1). To this end, we derive the stationary moment equation with uncertainparameters and develop an optimization problem for bounding the intervals of the copynumber moments. In particular, building upon the recently proposed moment computationmethod by the authors and several others , we formulate a semideﬁnite program, whichbelongs to a class of convex optimization whose globally optimal solution can be eﬃcientlyfound with established algorithms. The proposed approach is then demonstrated with tworeaction systems with diﬀerent assumptions on the underlying parameter distributions.The organization of this paper is as follows. In section II B, we formally deﬁne the problemand derive the moment equation with uncertain parameters. In section III, we derive theoptimization to bound the intervals of the copy number statistics. Section IV is devoted tothe demonstration of the proposed optimization approach using two illustrative examples.Finally, we summarize the results and discuss the remaining issues and opportunities inSection V.The following notations are used. N is the set of natural numbers including zero, N := { , , , . . . } , Z is the set of integers, and R ≥ is the set of non-negative real numbers R ≥ := { x ∈ R | x ≥ } . A superscript is used to represent the dimension of the vector space, e.g., N n . supp f ( x ) is the support of a (probability density) function f ( x ).4 I. MODEL OF STOCHASTIC CHEMICAL REACTIONS

In this section, we ﬁrst introduce a chemical master equation, a mathematical modelof stochastic chemical reactions, and formally deﬁne the problem of moment analysis withuncertain reaction parameters. The stationary moment equation is then derived to use forthe development of our optimization problem.

A. Chemical Master Equation and Problem Formulation

Consider a chemical reaction system that consists of n species of molecules, M , M , . . . , M n ,and r reactions. We denote the copy number of the molecular species M j by x j and deﬁne x := [ x , x . . . , x n ] T ∈ N n . The stoichiometry of the i -th reaction is deﬁned by s i ∈ Z n ,meaning that the molecular copy numbers change from x to x + s i by the reaction i . Thereactions occur in a stochastic manner due to the low copy nature of the molecular species,and thus, the dynamics of the copy number x is considered as a stochastic process. Morespeciﬁcally, the probability of the occurrence of the reaction i in an inﬁnitesimal time dt isgiven by w i ( x , k i ) dt , where w i ( x , k i ) is the propensity function. We assume that all reactionsare elementary, meaning that the propensity function can be deﬁned depending on the typeof the reactions as follows: w i ( x , k i ) :=  k i ( ∅ k i −→ product) k i x p ( M p k i −→ product) k i x p x q ( M p + M q k i −→ product) k i x p ( x p −

1) ( M p + M p k i −→ product) , (1)where k i ∈ R ≥ is a positive rate constant associated with the i -th reaction.The copy number x is a random variable, and thus, we deﬁne a conditional probability oftime t , initial value x , and the rate constants k := [ k , k , . . . , k r ] T ∈ R r ≥ , which we denoteby P ( x | k , t, x ). The evolution of P ( x | k , t, x ) is then governed by the chemical masterequation (CME). ddt P ( x | k , t, x ) = r X i =1 { w i ( x − s i , k i ) P ( x − s i | k , t, x ) − w i ( x , k i ) P ( x | k , t, x ) } . (2)5he CME is also known as Kolmogorov’s forward equation for a discrete state Markov chain,where the state of the (semi-inﬁnite) chain is the copy number of molecular species.The CME (2) characterizes the dynamics of the intrinsic variability caused by the stochas-tic reaction events within a cell. On the other hand, the cell population is also subject toextrinsic noise, which results in the variation of the kinetic parameters k . This variationcan be characterized by the parameter distribution P ( k ) across the cell population.In what follows, we consider analyzing the stationary moments of the copy number dis-tribution P ( x ) := lim t →∞ P ( x | t ) when the parameter distribution P ( k ) is partially given inthe form of its moments. Speciﬁcally, our goal is to propose a mathematical optimizationprogram for computing the bounds of the stationary moments and their associated statis-tical values such as the mean and the variance of the distribution based on the a priori information of the parameter distribution P ( k ). More formally, the problem is stated asfollows. Problem.

Consider the chemical master equation (2). Suppose the Markov chain modeledby eq. (2) is ergodic, and the stationary moment of the copy number distribution P ( x )is ﬁnite. For a given set of parameter distributions P , compute the upper and the lowerbounds of the stationary moments of P ( x ).It is reasonable, in practice, to assume that the actual distribution of the parameters P ( k ) is unknown but only some statistics such as the mean and the covariance are known.Thus, we here consider the case where the set P is characterized by some of the momentsof parameter distributions. We also assume that the Markov chain is ergodic, i.e., the chainis irreducible and positive recurrent, in which case there is a unique stationary distributionsatisfying P ( x | k ) = lim t →∞ P ( x | k , t, x ). Assumption 1

The Markov chain associated with the CME (2) is ergodic, and the station-ary moment of the copy number distribution P ( x | k ) is ﬁnite for all k ∈ supp P ( k ), where P ( k ) is a parameter distribution that belongs to a given set of distributions P .It should be noted that the stationary distribution of the copy numbers P ( x ) might notbe unique for the set of the parameter distributions P . In other words, we can obtain onlyan interval of statistics of the copy number distribution. The upper and the lower bound ofthe statistics then gives the worst-case statistics of the stochastic chemical system (2) whenthe underlying parameter distribution P ( k ) ∈ P is uncertain (FIG. 1).6 . Stationary moment equation with uncertain parameters To analyze the stationary moments of the copy number distribution, we introduce themoments of the stationary density distribution P ( x , k , x ). Speciﬁcally, we deﬁne E [ x α k β ] := X x ∈ N n Z K×X x α k β P ( x , k , x ) , (3)where K := supp P ( k ) and X := supp P ( x ). The scalars x α and k β are deﬁned by x α := n Y i =1 x α j i = x α x α . . . x α n n , (4) k β := r Y i =1 k β i i = k β k β . . . k β r r . (5)with α := [ α , α , . . . , α n ] T ∈ N n , β := [ β , β , . . . , β r ] T ∈ N r . (6)It should be noted that the moments of the copy number distribution P ( x ) and the parameterdistribution P ( k ) are the special cases where β = and α = , respectively.The stationary moment equation that E [ x α k β ] must satisfy is obtained by substituting0 into the left-hand side of the CME (2), multiplying both sides by x α k β and taking sumand integral for x , k and x . Speciﬁcally, we have0 = r X i =1 X γ ∈ Γ i a α i, γ E [ x γ k β + e i ] , (7)for α ∈ N n and β ∈ N r , where e i is the vector whose i -th entry is 1, and 0 otherwise, andthe constant a α i, γ is the coeﬃcient of x γ in the polynomial { ( x + s i ) α − x α } w i ( x , k i ) /k i ( i =1 , , . . . , r ). The range of the summation Γ i ( i = 1 , , . . . , r ) isΓ i := ( [ γ , γ , . . . , γ n ] T ∈ N n | n X j γ j = n X j =1 α j + deg( w i ( x , k i )) − ) , (8)which depends on the degree of the polynomial w i ( x , k i ) in x , i.e., deg( w i ( x , k i )).It should be noticed that the equation (7) is linear in moments but is inﬁnite dimen-sion. This is because, for each α and β , the right-hand side has higher-order moments E [ x α k β + e i ] ( i = 1 , , . . . , r ), and when w i ( x , k i ) deﬁned in eq. (1) is quadratic for some i = 1 , , . . . , r , there are higher-order moments E [ x γ k β ] satisfying P ni =1 γ i > P ni =1 α i .7o solve the stationary moment equation (7), we consider truncating equations for highorder moments. Let ρ and σ denote the maximum order of the moments used for ouranalysis, i.e., ρ := max k α k , σ := max k β k , (9)where α and β are deﬁned in eq. (6), and k · k is the ℓ norm of the vector space. Thetruncated equation can then be represented by = A µ + B ν + C ξ (10)where µ , ν and ξ are the vectors of E [ x α ], E [ x α k β ] and E [ k β ], respectively, i.e., µ := [ E [ x α ] , E [ x α ] , . . . ] T , (11) ν := [ E [ x α k β ] , E [ x α k β ] , . . . , E [ x α k β ] , E [ x α k β ] , . . . ] T ( α i = , β i = ) , (12) ξ := [ E [ k β ] , E [ k β ] , . . . ] T . (13)Note that x α i and k β i in eqs. (11)–(13) are scalars deﬁned in eq. (4) and eq. (5), respectively.The equation (10) implies that the stationary moments exist in the null space of anassociated linear equation = A ¯ µ + B ¯ ν + C ¯ ξ , (14)where ¯ µ , ¯ ν and ¯ ξ are (independent) variables. When the moments of the parameter distri-bution, ξ , are a priori , they can be substituted in ¯ ξ to reduce the variables. However, theequation is still highly underdetermined, and the solution cannot be uniquely determinedby simply solving the linear equation (14).In the next section, we formulate an optimization problem that seeks guaranteed upperand lower bounds of the stationary moments based on eq. (14) by introducing additionalnecessary conditions to constrain the solution space of the linear equation (14). The resultingoptimization problem provides practically useful bounds of E [ x α ], enabling rigorous andrational design and analysis of stochastic chemical reactions.8 II. MATHEMATICAL OPTIMIZATION FOR BOUNDING STATIONARYMOMENTS OF UNCERTAIN CHEMICAL REACTIONSA. Overview of the optimization scheme

Before presenting a key proposition that leads to the optimization problem, we formallydeﬁne the class of uncertain parameter distributions P ( k ). We consider a set of uncertainparameter distributions, P , whose (uncertain) moments are speciﬁed by linear inequalities of ξ by h i ( ξ ) ≥ i = 1 , , . . . , m ), and the support of the random variable k , i.e., supp P ( k ),is a subset of a semialgebraic set { k | c i ( k ) ≥ i = 1 , , . . . , ℓ ) } . That is, P := (cid:8) P ( k ) | h i ( ξ ) ≥ i = 1 , , . . . , m ) , supp P ( k ) ⊆ { c i ( k ) ≥ } ℓ i =1 (cid:9) . (15)We also deﬁne a semialgebraic set { x | d i ( x ) ≥ i = 1 , , . . . , ℓ ) } satisfyingsupp P ( x ) ⊆ { d i ( x ) ≥ } ℓ i =1 (=: X ) . (16)The set X is, for example, the positive orthant { x | x i ≥ i = 1 , , . . . , n ) } since the copynumber is non-negative.Then, we have the following proposition. Proposition 1.

Consider the stochastic reaction system governed by the CME (2) and theassociated truncated moment equation (10). Suppose Assumption 1 holds, and X is given.Then, for any given parameter distributions P ( k ) ∈ P , the moments of the joint distribution P ( x , k ), i.e., µ , ν and ξ in eq. (10), satisfy A µ + B ν + C ξ = 0 , (17) h i ( ξ ) ≥ i = 1 , , . . . , m ) (18)( H :=) H ( g g T0 ) (cid:23) O, (19)( H i :=)  H ( c i ( k ) g i g T i ) (cid:23) O ( i = 1 , , . . . , ℓ ) , H ( d i − ℓ ( x ) g i g T i ) (cid:23) O ( i = ℓ + 1 , ℓ + 2 , . . . , ℓ + ℓ ) , (20)where g i is a vector of monomial basis of polynomials R [ x , k ] of an arbitrary degree, and H ( G ) := X x ∈ N n Z K×X G ( x , k ) P ( x , k , x ) (21)9ith G ( x , k ) being a polynomial matrix of x and k .The proof of this proposition is shown in Appendix A. The entries of the matrices H i ( i =1 , , . . . , ℓ + ℓ ) are the moments of the joint distribution P ( x , k ), and thus, the entriescontain the variables of the linear equation (17), i.e., µ , ν and ξ . The semideﬁnite matrix H ( g g T0 ) is known as the moment matrix . In what follows, we refer to the semideﬁniteconditions (19) and (20) as moment conditions.This proposition leads to additional necessary conditions that the variables ¯ µ , ¯ ν and ¯ ξ inthe linear equation (14) must satisfy. Thus, by adding these constraints, we can narrow thesolutions of the underdetermined linear equation (14). It should be noted that Proposition1 is an extension of the Proposition in the authors’ previous work , where similar algebraicconditions were presented for the case with deterministic (ﬁxed) parameters.We use the conditions in Proposition 1 to seek the upper and the lower bounds of momentvalues. Speciﬁcally, we replace the moments µ , ν and ξ in the conditions of Proposition 1with independent variables ¯ µ , ¯ ν and ¯ ξ and explore the maximum or the minimum value ofa target variable subject to the constraints (17)–(20). For example, the interval of the meancopy number E [ x i ] can be computed by maximizing and minimizing the corresponding entryof ¯ µ subject to eqs. (17)–(20). Since eqs. (17)–(20) are necessary conditions, the maximumand the minimum value becomes a guaranteed upper and lower bound of the moment,respectively. More generally, this maximization/minimization process can be summarizedas an optimization problem as follows. Optimization Problem. min ¯ µ , ¯ ν , ¯ ξ f T ¯ µ subject to A ¯ µ + B ¯ ν + C ¯ ξ = 0 , (22) h i ( ¯ ξ ) ≥ i = 1 , , . . . , m ) , H i (cid:23) O ( i = 0 , , , . . . , ℓ + ℓ ) . where f is a vector of user-deﬁned constants for deﬁning the moments of interest.This optimization problem explores the minimum value of the linear combination of theuncentered moments µ over the set speciﬁed by the constraints. The maximization problemcan be formulated by simply ﬂipping the sign of the constant f . The optimization prob-lem belongs to a class of convex optimization called semideﬁnite program (SDP), or more10peciﬁcally, it is a class of the generalized moment problem . Thus, existing optimizationsolvers are available to eﬃciently compute the intervals of the moments.The optimization program has two tuning parameters, ρ and σ , which control the numberof variables as deﬁned in eq. (9). These parameters are used to control the tradeoﬀ betweenthe tightness of the bounds and the computational cost. In general, the solution of theoptimization problem becomes gives tighter bounds with the increase of ρ and σ , while thenumber of variables increases also increases. This feature is demonstrated in Section IV withillustrative examples. Moreover, The size of the matrices H i ( ℓ = 0 , , . . . , ℓ + ℓ ) dependson the choice of the maximum degree of the polynomial basis g i deﬁned in Proposition 1,which is speciﬁcally written by g i := X p i ⊗ K q i , (23)where X p and K q represent a vector of monomial basis of polynomials R [ x ] of degree p and R [ k ] of degree q , respectively. In what follows, we use p i =  ⌈ ρ/ ⌉ ( i = 0 , , . . . , l ) ⌊ ρ/ ⌋ ( i = l + 1 , l + 2 , . . . , l + l ) , (24) q i =  ⌈ σ/ ⌉ ( i = 0 , l + 1 , l + 2 , . . . , l + l ) ⌊ σ/ ⌋ ( i = 1 , , . . . , l ) (25)to constrain all the variables in the linear equation. B. Uncertainty of parameter distributions

The proposed optimization program provides a mathematical certiﬁcate that the momentof a copy number distribution is within the computed bounds even when the parameterdistribution is given only as a set P . This feature is particularly useful when one needsto guarantee the robustness of synthetic biomolecular systems in synthetic biology. Thefollowing proposition answers the Problem posed in Section II A. Proposition 2.

Consider the stochastic reaction system governed by the CME (2) and a setof parameter distributions P . Suppose Assumption 1 holds, and X in eq. (16) is given. Let ϕ ∗ min denote the minimum value of the stationary moment f T µ among all possible parameter11 ABLE I. Typical representative distributions and their momentsDistribution Raw moments E [ k β i i ] NotationPoisson e − λ P ∞ j =0 λ j j ! j β i λ is the mean of the distributionGamma θ β i Q β i j =1 ( η + j − η and θ are shape and scale factor distributions P ( k ) ∈ P , i.e., ϕ ∗ min := min P f T µ . Then, the solution of the minimizationproblem (22), ϕ min , satisﬁes ϕ min ≤ ϕ ∗ min .In the simplest case where the entire parameter distribution P ( k ) is given without uncer-tainty, we can simply substitute its moments E [ k β ] to ¯ ξ in eqs. (22) to solve the optimizationproblem. The moments of the parameter distribution, ξ , can be systematically determinedif the parameters k i ( i = 1 , , . . . , r ) are independently distributed, and the underlying dis-tribution P ( k ) = Q ri =1 P ( k i ) is given by well-known parametric distributions such as Poissonand gamma distributions.For gamma distributions, for example, we substitute E [ k β i i ] = θ β i β i Y j =1 ( η + j −

1) ( β i = 1 , , . . . , r ) (26)into the corresponding entries of ¯ ξ , where η and θ are the shape and the scale factor of thegamma distribution, respectively. The other cross-moments of P ( k ) should be the product ofeach moment, E [ Q ri =1 k β i i ] = Q ri =1 E [ k β i i ] since the parameters are independently distributed.For Poisson distributions, the raw moments are E [ k β i i ] = e − λ ∞ X j =0 λ j j ! j β i , (27)where λ is the mean of the distribution. These relations are summarized in Table I.In a more complex case, where the underlying parameter distribution is not parametricnor completely known, the parameter distribution can be constrained by the inequalities bythe inequalities h i ( ξ ) ≥ ξ .For example, an uncertain mean value of a parameter k can be speciﬁed by E [ k ] − E [ k ] ≥ − E [ k ] + E [ k ] ≥

0, where E [ k ] ≥ E [ k ] ≥ ABLE II. Reaction kinetics in the reaction system (28)reaction index i propensity w i stoichiometry s i k D [1 , T k x [ − , T k x [0 , T k x [0 , − T IV. APPLICATION EXAMPLES

In this section, we demonstrate the proposed optimization approach by using two exam-ples of stochastic chemical reactions and show that the solution of the optimization problemgives practically useful upper and lower bounds of the statistics of the stationary copy num-ber distribution P ( x ). In Section IV A, we ﬁrst consider a model of gene expression, forwhich the propensity functions w i ( x , k i ) are aﬃne in the copy number x . The goal of theﬁrst example is to illustrate the proposed optimization program using a concrete exampleand verify the output of the optimization by comparing it with the analytic solutions of themoment equation, which we can obtain for this special example. In particular, we considerthe case where the parameter distributions are uncertain, and only part of the moments areavailable, and thus, the statistics of the molecular copy numbers can only be obtained asan interval. We show that the upper and the lower bounds of the interval can be tightlycomputed using the proposed optimization framework. Then, in Section IV B, we apply thesame procedure to a more general class of reaction systems, for which the analytic solutionof the moment equation is no longer available. A. A model of gene expression

1. Illustration of the stationary moment equation

Consider the following reaction system that consists of two species of molecules, mRNAand protein. ∅ w −→ mRNA , mRNA w −→ ∅ , mRNA w −→ mRNA + protein , protein w −→ ∅ , (28)13 A) (B)

FIG. 2. Upper and lower bounds of statistics of protein in the reaction system (28). (A) Meancopy number. (B) Standard deviation (SD) of copy number. where ∅ represents either degradation or molecules outside of the reaction system. We denotethe copy numbers of mRNA and protein by x and x , respectively, and deﬁne x = [ x , x ] T .The propensity function w i ( x , k i ) and the stoichiometry s i ( i = 1 , , ,

4) are shown inTABLE II.For the illustration purpose, we ﬁrst consider the case where the underlying distributionof the transcription rate k is a gamma distribution G ( k ; η, θ ), and the other parameters, k , k and k , are identical between cells, i.e., P ( k ) = G ( k ; η, θ ) δ ( k − k ∗ ) δ ( k − k ∗ ) δ ( k − k ∗ ) , (29)where η and θ are the shape and the scale parameters of the gamma distribution, respectively,and δ ( · ) represents the delta function. Our goal is to compute the mean and the varianceof the copy numbers of the protein.To this goal, we ﬁrst derive the moment equation (10). The truncated equation with theorder of moments up to ρ = 1 and σ = 0 is given by =  − k ∗ k ∗ − k ∗   E [ x ] E [ x ]  +  D  E [ k ] . (30)Suppose the shape and the scale factors of the gamma distribution G ( k ; η, θ ) are η = 2and θ = 0 .

4, respectively, and the other reaction parameters are k ∗ = 0 . , k ∗ = 0 . , k ∗ = 0 . D = 5 . (31)14 ound of correlation FIG. 3. Bounds of maximum and minimum values of mean copy number of molecule B in thereaction system (28).

The mean of the parameter k , which appears in the second term of eq. (30) is then obtainedas E [ k ] = θη = 0 . E [ x ]obtained by the optimization (22) coincide with each other even without the moment con-ditions H i (cid:23)

0. FIGs. 2 (A) and (B) show the computed upper and lower bounds of themean and the standard deviation of the protein obtained for diﬀerent values of σ . Thesevalues are indeed consistent with the analytic solutions, E [ x ] = k ∗ k ∗ E [ x ] = k ∗ Ck ∗ k ∗ E [ k ] , (32)and q E [ x ] − ( E [ x ]) = s(cid:18) k ∗ Ck ∗ k ∗ (cid:19) { E [ k ] − ( E [ k ]) } + k ∗ Ck ∗ k ∗ (cid:18) k ∗ k ∗ + k ∗ (cid:19) E [ k ] . (33)15 . Analysis for an uncertain parameter distribution Next, we consider a more complex and realistic case where the parameter distributionhas uncertainty and cannot be uniquely determined. For this case, the statistics of the copynumber distribution are obtained as an interval.Consider the set of reactions (28) where the transcription rate k and the translation rate k follow a joint distribution P ( k , k ). The degradation rates are assumed to be identicalbetween cells, which are given by k = k ∗ and k = k ∗ . We assume that the marginaldistributions, P ( k ) and P ( k ) are gamma distributions, whose shape and scale factors aregiven by ( η , θ ) and ( η , θ ), respectively, and k and k , are possibly correlated, which isoften the case if a common cellular resource is aﬀected by extrinsic noise. Speciﬁcally, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E [ k k ] − E [ k ] E [ k ] p { E [ k ] − ( E [ k ]) }{ E [ k ] − ( E [ k ]) } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ | r | ( | r | ≤ , (34)where r is Pearson’s correlation coeﬃcient. It should be noted that the joint distribution ofthe parameters P ( k , k ) cannot be uniquely determined when r = 0 since the cross-moments E [ k β k β ] ( β , β >

0) are not unique while the moments of each parameter E [ k β ] and E [ k β ]can be determined by eq. (26).To compute the intervals of the mean protein copy number E [ x ], we introduce the con-straint corresponding to eq. (34) as h ( E [ k k ]) := − E [ k k ] + E [ k ] E [ k ] + | r | q E [ k ] − ( E [ k ]) q E [ k ] − ( E [ k ]) (35) h ( E [ k k ]) := E [ k k ] − E [ k ] E [ k ] − | r | q E [ k ] − ( E [ k ]) q E [ k ] − ( E [ k ]) (36)in addition to the stationary moment equation and the moment conditions. Note that thevariable of h i ( · ) is E [ k k ], and the other moments can be computed from the a priori information of the marginal distributions. The moment matrix H = g g T0 is deﬁned with g = [1 , E [ x ] , E [ x ] , E [ k ] , E [ k ] , E [ x k ] , E [ x k ] , E [ x k ] , E [ x k ]] T (37)when ρ = σ = 1, and c ( k ) := k , c ( k ) := k , d ( x ) = x , d ( x ) = x (38)Finally, we replace all of the moment variables E [ x i k j ] in these constraints with indepen-dent variables ¯ µ , ¯ ν and ¯ ξ to obtain the optimization problem (22). Then, the optimization16 ABLE III. Reaction kinetics in the reaction system (40)Reaction number i Reaction rate w i Stoichiometry s i w = k D s = 12 w = k x s = − w = k x ( x − s = − problem is solved with the parameters k ∗ , k ∗ and C shown in (31), and the shape and thescale factors, η i and θ i , are set to ( η , θ ) = (2 , .

4) and ( η , θ ) = (3 , . ρ = 1 and σ = 1 are used.The upper and the lower bounds of the mean copy number E [ x ] for diﬀerent values ofthe correlation coeﬃcient, | r | = 0 . , . , . , . , and 1 .

00 are shown in FIG. 3. Thegap between the upper and the lower bounds increases almost linearly in the correlationcoeﬃcient | r | . This is because of the uncertainty in the underlying parameter distribution P ( k , k ).In this special example, the upper and the lower bounds can be analytically obtained.To be more speciﬁc, Cθ θ k ∗ k ∗ ( η η − | r |√ η η ) ≤ E [ x ] ≤ Cθ θ k ∗ k ∗ ( η η + | r |√ η η ) , (39)which is linear in | r | . The output of the optimization problem indeed coincides with thisinequality as shown in Fig. 3, implying that the bounds are tight even when the parameterdistribution is uncertain, i.e., the distribution is not uniquely determined. B. A model of dimerization process

1. Analysis for fully known parameter distributions

In this section, we consider a more complex case that includes bimolecular reactions.Speciﬁcally, we consider the dimerization process of a molecular species A, φ w −→ A, A w −→ φ, A w −→ A : A, (40)where the copy number of the molecular species A is denoted by x . The propensity functions w i ( x, k i ) and stoichiometries s i ( i = 1 , ,

3) for these reactions are speciﬁed in TABLE III.17

A) (B)

FIG. 4. Upper and lower bounds of statistics of molecule A in the reaction system (40). (A) Mean.(B) Standard deviation.

Unlike the previous example, the propensity function w ( x, k ) = k x ( x −

1) is not linear in x , in which case an analytic solution of the moment equation is no longer obtained. In whatfollows, we ﬁrst show that the bounds of the statistics can be tightly obtained even whenthe moment equation is not closed for the copy number x , and then show that these boundsbecome less tight when the equation is not closed for both of the parameters and the copynumbers, for which case the propensity function becomes a third order monomial of k and x . Suppose the distribution of the parameter k is a gamma distribution, and P ( k ) is givenby P ( k ) = G ( k ; η, θ ) δ ( k − k ∗ ) δ ( k − k ∗ ) , (41)and the parameters are η = 2, θ = 0 . k ∗ = 0 . k ∗ = 0 .

02 and D = 5. Here, we areinterested in computing the mean and the variance of the monomer copy number x .The truncated moment equation for ρ = 1 and σ = 1 is, for example, obtained as =  − k ∗ + 2 k ∗ − k ∗  E [ x ] E [ x ]  +  − k ∗ + 2 k ∗ − k ∗  E [ xk ] E [ x k ]  +  D D  E [ k ] E [ k ]  , (42)18here E [ k ] and E [ k ] are constants given by eq. (26).The equation (42) is underdetermined, and thus, the solution of the truncated momentequation (42) is not unique. Thus, we use the moment matrices that include all of themoments to restrict possible solutions of eq. (42). Speciﬁcally, H =  E [ x ] E [ k ] E [ xk ] E [ x ] E [ x ] E [ xk ] E [ x k ] E [ k ] E [ xk ] E [ k ] E [ xk ] E [ xk ] E [ x k ] E [ xk ] E [ x k ]  (cid:23) O, (43)and c ( k ) = k , d ( x ) = x. (44)Solving the optimization problem, we obtain the mean copy number E [ x ] and the stan-dard deviation q E [ x ] − ( E [ x ]) for diﬀerent values of σ as shown in FIG. 4 (A) and (B),respectively, where the order of the moments for the x is set as ρ = 5. These ﬁguresshow that the gap between the upper and the lower bounds of the statistics monotonicallyapproaches to each other. With σ = 9, the bounds are obtained as5 . ≤ E [ x ] ≤ . , . ≤ E [ x ] − ( E [ x ]) ≤ . . (45)These bounds are also veriﬁed by the 100,000 sample-path simulations generated by thestochastic simulation algorithm (see “simulation” in FIG. 4(A) and (B)).Not surprisingly, the tightness of the bounds depends on the number of underdeterminedvariables in the equation (42). The next example considers the case where the parameter k instead of k follows a gamma distribution, i.e., P ( k ) = G ( k ; θ, η ) δ ( k − k ∗ ) δ ( k − k ∗ ),where η = 1 and θ = 0 .

02, respectively. The parameters are k ∗ = 0 . , k ∗ = 0 . C = 5 . (46)In this case, both x and k in w ( x, k ) become variables, and thus, the right-hand side ofthe moment equation depends on higher-order moments in x and k , which results in more19 A) (B)

FIG. 5. Computed upper and lower bounds of (A) the mean and (B) the standard deviation of themolecular species A when the distribution of k is uncertain. variables than eq. (42). For example, =  Dk ∗ − k ∗

00 0 0  E [ x ] E [ x ]  +  − − k ∗ −  E [ xk ] E [ x k ] E [ xk ] E [ x k ]  +  Dk ∗  E [ k ] E [ k ]  , (47)where E [ k ] and E [ k ] are constants given by eq. (26).Figures 5 (A) and (B) illustrate the computed bounds of the mean E [ x ] and the standarddeviation q E [ x ] − ( E [ x ]) , where ρ = 3 and 7 is used. Similar to the previous examples,the upper and the lower bounds approach to each other, but, unlike FIG. 4, the gap remainsfor relatively large ρ and σ .

2. Analysis for uncertain parameter distributions

Finally, we consider a more practical case where the joint distribution of the parameters P ( k , k ) is uncertain. Speciﬁcally, we assume that the marginal distributions of P ( k ) and20 IG. 6. Bounds of maximum and minimum values of mean copy number E [ x ] for the reactionsystem (40). P ( k ) are gamma distributions, but they are correlated, i.e., h ( E [ k k ]) in eq. (35) and h ( E [ k k ]) in eq. (36) are non-negative as discussed in Section IV A 2.The shape and scale factors ( η i , θ i ) ( i = 1 ,

3) are assumed to be ( η , θ ) = (2 , .

4) and( η , θ ) = (4 , . ρ = 5 and σ = 9, and the other parameters are set to the same as eq. (46). Fordiﬀerent values of the correlation coeﬃcient r , we compute the upper and lower bounds ofmaximum and minimum values of mean copy number E [ x ]. The results are illustrated inFIG. 6.In FIG. 6, we also plot the bounds when P ( k ) and P ( k ) are independent, that is, E [ k k ] = E [ k ] E [ k ], for which case the joint distribution P ( k , k ) is uniquely determinedsince the marginal distributions are assumed to be gamma distributions. Since the jointparameter distribution cannot be uniquely determined except for the case of independentdistributions, it is natural that there is a gap between the upper and the lower bounds. Weobserve from FIG. 6 that the gap between the upper and the lower bounds monotonicallyincreases with the correlation coeﬃcient | r | .21o verify this result, we generate 100,000 sample paths by the stochastic simulationalgorithm (SSA) and plot the intervals of the mean copy number (see FIG. 6). It should benoted that the tightness of the bounds cannot be discussed from FIG. 6 due to the possibleinsuﬃciency of the sample paths. In general, we need to generate a signiﬁcantly large numberof sample paths for parameters drawn from the distributions satisfying the constraints (35)and (36) to obtain asymptotic bounds when the parameter distribution is uncertain. Theproposed approach, on the other hand, is capable of computing mathematically rigorousbounds of the moments. Thus, it is useful for rational engineering and analysis of stochasticchemical reaction systems. Remark.

All optimization problems in this section were solved with SeDuMi 1.3.2 onMATLAB 2019a. In the code, the variables were normalized by constants to avoid numericalinstability of the solver. V. CONCLUSION

We have proposed a computational framework to analyze the worst-case stationary mo-ments of the molecular copy number distributions in stochastic chemical reactions withparametric uncertainty. Speciﬁcally, a mathematical optimization method has been devel-oped to compute the intervals of the possible moment values of uncertain moment equationswhose parameters are given only partially using the statistics of the parameter distributions.A distinctive feature of the proposed method is that it has been derived without approxi-mating the governing equation of the stochastic chemical reactions, i.e. , the CME, unlikemany other approaches reviewed in Section I. In other words, the moments of interest areguaranteed to be within the bounds for all possible parameter distributions satisfying thegiven statistics. This feature is useful for model-based rational engineering of biomolecularcircuits in synthetic biology, where the robustness of synthetic reactions is important. Inparticular, the optimization program can easily incorporate the new information of the pa-rameter distributions upon the arrival of new data, allowing for the progressive improvementof the predicted bounds through the multiple cycles of analyses and experiments.Though this paper has presented the fundamental theoretical results, future eﬀorts arestill required to broaden the applicability of the proposed approach to more complex chemi-cal reactions with many molecular species. The remaining challenges toward this goal lie in22omputational cost and the numerical instability of the optimization solver, which are com-mon to the moment computation methods based on the generalized moment problem . Thehigh computational burden is due to the number of variables in the optimization problem,which rapidly increases as we increase ρ and σ . A potential solution to this issue would beto use the sparse structure of the dual optimization problem to reduce the variables , butfurther investigation is left for future work. The numerical instability, on the other hand,is due to the fact that the values of the low and the high moments are diﬀerent by ordersof magnitude. The normalization of the variables is eﬀective for this problem, but a moresystematic scheme is desirable in the future. REFERENCES P. S. Swain, M. B. Elowitz, and E. D. Siggia, Proceedings of National Academy of Sciencesof the United States of America , 12795 (2002). E. M. Ozbudak, M. Thattai, I. Kurtser, A. D. Grossman, and A. van Oudenaarden,Nature Genetics , 69 (2002). Y. Taniguchi, P. J. Choi, G.-W. Li, H. Chen, M. Babu, J. Hearn, A. Emili, and X. S. Xie,Science , 533 (2010). M. B. Elowitz, A. J. Levine, E. D. Siggia, and P. S. Swain, Science , 1183 (2002). D. A. McQuarrie, Journal of Applied Probability , 413 (1967). D. T. Gillespie, Physica A , 404 (1992). D. T. Gillespie, Journal of Computational Physics , 403 (1976). D. T. Gillespie, The Journal of Chemical Physics , 297 (2000). D. T. Gillespie, The Journal of Chemical Physics , 1716 (2001). N. G. van Kampen,

Stochastic processes in physics and chemistry , 3rd ed. (North Holland,2007). A. Singh and J. P. Hespanha, IEEE Transactions on Automatic Control , 414 (2011). E. Lakatos, A. Ale, P. D. W. Kirk, and M. P. H. Stumpf, The Journal of Chemical Physics , 094107 (2015). D. Schnoerr, G. Sanguinetti, and R. Grima, The Journal of Chemical Physics , 185101(2015). B. Munsky and M. Khammash, The Journal of Chemical Physics , 044104 (2006).23 A. Gupta, J. Mikelson, and M. Khammash, The Journal of Chemical Physics , 154101(2017). K. R. Ghusinga, C. A. Vargas-Garcia, A. Lamperski, and A. Singh, Physical Biology ,04LT01 (2017). J. Kuntz, P. Thomas, G.-B. Stan, and M. Barahona, The Journal of Chemical Physics , 034109 (2019). Y. Sakurai and Y. Hori, in

Proceedings of IEEE Conference on Decision and Control (2017)pp. 1206–1211. Y. Sakurai and Y. Hori, Journal of the Royal Society Interface , 20170709 (2018). G. R. Dowdy and P. I. Barton, The Journal of Chemical Physics , 084106 (2018). G. R. Dowdy and P. I. Barton, The Journal of Chemical Physics , 074103 (2018). Y. Sakurai and Y. Hori, IEEE Control Systems Letters , 290 (2019). J. B. Lasserre,

Moments, positive polynomials and their applications (Imperial CollegePress, 2009). E. K. O. J. M. Raser, Science , 1811 (2004). V. Shahrezaei, J. F. Ollivier, and P. S. Swain, Molecular Systems Biology , 196 (2008). M. Scott, T. Hwa, and B. Ingalls, Proceedings of National Academy of Sciences of theUnited States of America , 7402 (2007). C. Zechner, J. Ruess, P. Krenn, S. Pelet, M. Peter, J. Lygeros, and H. Koeppl, Proceedingsof National Academy of Sciences of the United States of America , 8340 (2012). M. Hallen, B. Li, Y. Tanouchi, C. Tan, and L. You, PLOS Computational Biology ,e1002209 (2011). T. Toni and B. Tidor, PLOS Computational Biology , e1002960 (2013). G. Caravagna, G. Mauri, and A. d’Onofrio, PLOS Computational Biology , e51174(2013). M. ˇCeska, D. ˇSafr´anek, S. Draˇzan, and L. Brim, PLOS One , e94553 (2014). H. J. Landau,

Moments in Mathematics (American Mathematical Society, 1987). J. F. Sturm, Optimization Methods and Software , 625 (1999). H. Waki, S. Kim, M. Kojima, and M. Muramatsu, SIAM Journal on Optimization ,218 (2006). 24 ppendix A: Proof of Proposition 1 Eq. (17) and eq. (18) follow directly from eq. (10) and eq. (15), respectively, and eq.(19) is obvious because of the product of the same vectors. Eq. (20) holds because c i ( k ) ≥ d i − ℓ ( x ) ≥ P ( x , k , x ), which follows from eq.(15) and eq. (16). (cid:3)(cid:3)