[PDF] Diagnostics for assessing the linear noise and moment closure approximations

Abstract

Solving the chemical master equation exactly is typically not possible, so instead we must rely on simulation based methods. Unfortunately, drawing exact realisations, results in simulating every reaction that occurs. This will preclude the use of exact simulators for models of any realistic size and so approximate algorithms become important. In this paper we describe a general framework for assessing the accuracy of the linear noise and two moment approximations. By constructing an efficient space filling design over the parameter region of interest, we present a number of useful diagnostic tools that aids modellers in assessing whether the approximation is suitable. In particular, we leverage the normality assumption of the linear noise and moment closure approximations.

Full PDF

DDiagnostics for assessing the linearnoise and moment closureapproximations

Colin S. Gillespie and Andrew Golightly26th September 2018

Solving the chemical master equation exactly is typically not possible, soinstead we must rely on simulation based methods. Unfortunately, drawingexact realisations, results in simulating every reaction that occurs. Thiswill preclude the use of exact simulators for models of any realistic size andso approximate algorithms become important. In this paper we describea general framework for assessing the accuracy of the linear noise and twomoment approximations. By constructing an eﬃcient space ﬁlling design overthe parameter region of interest, we present a number of useful diagnostictools that aids modellers in assessing whether the approximation is suitable.In particular, we leverage the normality assumption of the linear noise andmoment closure approximations.

Due to advances in experimental techniques, it is now clear that cellular dynamicsincorporate a vast array of heterogeneous components. Whilst each component may berelatively simple, combining component systems results in complex, temporal dynamicsthat are not amenable to simple intuitive understanding.The recognition of such biological sophistication has lead to the conclusion thatcomplex biological processes cannot be understood through the application of ever-morereductionist experimental programs. Instead, by formulating the system of interest intoa mathematical framework, we can begin to combine disparate sources of knowledge.Furthermore, careful mathematical modelling of biological processes has other advantages.For example, Kowald and Kirkwood [1996] highlight possible interactions that wouldbe diﬃcult to observe experimentally. Therefore, a successful analysis of a biologicalsystem now requires a complementary wet and dry approach (see Ingalls [2008]).1 a r X i v : . [ q - b i o . Q M ] A ug hen modelling biological networks, it is important to incorporate the intrinsicnoise of the system. One standard approach is to utilise stochastic kinetic modelsdescribed using a set of chemical reactions, their associated hazards and an assumptionthat the system evolves according to a continuous-time Markov jump process (MJP).The transition kernel governing the MJP can be found by constructing and solvingKolmogorov’s forward equation, known in this context as the chemical master equation(CME) [Gillespie, 1992]. Unfortunately, the CME is rarely tractable for systems ofinterest and the vast size of the underlying state space means that numerically computingthe solution of the CME is not feasible (see Wilkinson [2012]). While it may not bepossible to solve the chemical master equation, it is usually straightforward to obtainexact realisations of the MJP using standard simulation algorithms. The most wellknown algorithm is the direct method developed by Gillespie [1976].Simulating from the model is not only crucial when building a system, but is alsoessential for parameter inference, since the observed data likelihood is usually analyticallyintractable. Exact simulation based approaches to MJP inference typically use dataaugmentation [Boys et al., 2008] coupled with Markov chain Monte Carlo (MCMC) orparticle MCMC [Golightly and Wilkinson, 2011, Owen et al., 2015]. In the simplestimplementation of the latter, only forward simulations from the model are required,and the method can be regarded as likelihood-free. Other likelihood-free approachesinclude the use of approximate Bayesian computation (ABC) schemes [see for example,Beaumont et al., 2002, Sisson et al., 2007, Toni et al., 2009]. These inference schemestypically require many millions of forward simulations and the resulting computationalcost may preclude their use when the system size or reaction rates are large. Due tothis computational hurdle, a number of approximate simulators have been proposed [foran overview,see Pahle, 2009]. Use of an approximate simulator in this way can be seenas performing exact (simulation-based) inference for the associated approximate model.Approximate models (and their associated simulators) that ignore the discrete natureof the stochastic kinetic model, but crucially, not stochasticity, include the diﬀusionapproximation [Gillespie, 2000], the linear noise approximation (LNA) [Kurtz, 1970, Elfand Ehrenberg, 2003] and moment closure approaches [van Kampen, 2007, Gillespie,2009]). Hybrid approaches which treat some species as discrete and others as continuoushave been proposed by Salis and Kaznessis [2005] and Sherlock et al. [2014] amongothers. Moment closure and LNA based approaches are particularly attractive, dueto their tractability. For the former, the ﬁrst two moments of the MJP are combinedwith an assumption of normality, whereas for the latter, the CME is approximated in alinear way, to give a process with normal transition densities. Unfortunately it is notstraightforward to check whether a given approximation technique yields acceptableresults since, by deﬁnition, the approximate simulator is not exact. For example, if themodel contains any second-order reactions then the mean population estimate fromthe linear noise approximation will not be exact (see Golightly and Gillespie [2013] forexample). However, the approximation may still be suﬃcient for model exploration orparameter inference.Recently, Cao and Petzold [2006] and Jenkinson and Goutsias [2013a] performed acomparison of approximate and exact simulators at speciﬁc parameter values. Essen-2ially, each proposed simulating N times from an exact and an approximate algorithm,calculating a distance metric and assessing accuracy by performing a hypothesis test.However, there are two major drawbacks with this test driven approach. First, for manyapproximate simulators we can analytically prove that the approximate and exact diﬀer,so as N increases we will always reject the null hypothesis. Second, in the parameterinference setting we are interested in the performance of the approximate simulatoracross a range of parameter values, not just at a particular value.In the related ﬁeld of computer experiments, complex models are emulated usinga faster model; typically a Gaussian process (GP). Since prediction is made using anemulator, it is essential that the emulator accurately represents the system. Bastosand O’Hagan [2009] provide a number of useful diagnostic measures (in the context ofGaussian processes) for assessing simulator quality. Within the context of stochastickinetic models, both the moment closure and LNA approaches can be seen as GPemulators.In this paper we present a set of general, principled methods for eﬃciently assessingthe quality of the linear noise and moment closure approximations across a largeparameter space based on the techniques found in the computer experiment literature.The diagnostic measures we present are simple to calculate and interpret, providingthe practitioner with a useful tool for assessing simulator accuracy. The remainder ofthis paper is organised as follows. Section 2 brieﬂy reviews stochastic kinetic modelsand exact simulation techniques before introducing the moment closure and linear noiseapproximations. In Section 3 we describe eﬃcient methods for exploring the parameterspace, and the diagnostic measures which comparisons between simulators are to bebased on. The methods are illustrated using three examples of increasing complexity. Suppose we have a system of chemical reactions with u chemical species { X , . . . , X u } and v reactions { R , . . . , R v } , where reaction R k , with rate parameter c k , corresponds to s k X + . . . + s ku X u → ¯ s k X + . . . + ¯ s ku X u , with s ki and ¯ s ki the number of molecules of type X i before and after the reaction R k ,respectively. Let X j,t be the random variable denoting the number of molecules ofspecies X j at time t and let X t be the u -vector X t = ( X ,t , X ,t , . . . , X u,t ) (cid:48) . Further, let s = ( s ij ) be a v × u matrix of the coeﬃcients s ij with ¯ s being deﬁned similarly. Thenthe u × v stoichiometry matrix s is deﬁned by s = (¯ s − s ) (cid:48) . (1)We denote x i,t to be the number of molecules of species X i at time t , and let x t be the u -vector x t = ( x ,t , x ,t , . . . , x u,t ) (cid:48) .The rate of reaction R k is deﬁned by the rate function h k ( x t , c k ), where c k is thereaction rate constant. Hence, the hazard of a type k reaction occurring depends on3he rate constant c k , as well as the state of the system at time t . This system can benaturally modelled as a Markov jump process, that is, in a small time increment, δt , theprobability of reaction R k occurring in the time interval ( t, t + δt ] is h k ( x t , c i ) δt [Gillespie,1992]. When a reaction of type k does occur, the system state changes by ¯ s k − s k . Atypical model assumption is that the reactions follow mass action kinetics. This resultsin a hazard function that takes the form of the rate constant c k multiplied by a productof binomial coeﬃcients expressing the number of ways in which the reaction can occur.The transition kernel of the MJP can be found by constructing and solving Kolmogorov’sforward equation, known in this context as the chemical master equation (CME). Denote p ( x t ) as the probability of being in state x t and note that we suppress dependence of p ( x t ) on the initial state x and the reaction constants c = ( c , . . . , c v ) (cid:48) for simplicity.The CME is given by ddt p ( x t ) = v (cid:88) k =1 p ( x t − s k ) h k ( x t − s k , c k ) − p ( x t ) h k ( x t , c k ) , (2)where h k ( x t , c k ) is the hazard function for reaction R k and s k is the k th column of thematrix s . Once p ( x t ) is obtained, a complete characterisation of the system is available.Unfortunately, the CME is only tractable for a handful of cases [see e.g. Gardiner, 1985].Consequently, for most systems of interest, an analysis via the CME will not be possible. Although the chemical master equation is rarely analytically tractable, it is straightfor-ward to draw exact realisations using a discrete event simulation method. The standardalgorithm, developed by Gillespie [1976], for simulating from a stochastic system is thedirect method (described in Algorithm 1). Essentially, at each algorithm iteration weselect a reaction to occur and update the species levels and clock. However, as thenumber of reactions or the size of the hazard functions increase, the computational costincreases.A number of improvements to this algorithm have been proposed. For example,McCollum et al. [2006] dynamically reorder the reactions from most to least likely, tosigniﬁcantly increase the speed of the algorithm. Alternatively Cao et al. [2004] suggestan pilot simulation to optimise the reaction order. Gibson and Bruck [2000] exploit themodel structure to avoid unnecessary updates. However, the underlying speed issues stillremain for models of reasonable size, necessitating the use of approximate simulationstrategies.

In what follows, we consider two tractable approximations of p ( x t ) that ignore discretenessbut not stochasticity. Both approaches assume that the distribution of X t at a particulartime point, t , is normal, so that X t ∼ N ( ψ t ( c ) , Σ t ( c )) (3)4 lgorithm 1: Direct method [Gillespie, 1976] Set t = 0 and initialise rate constants c , . . . , c v and the initial moleculenumbers x , , . . . , x u, . Propensities update: update each of the v hazard functions, h k ( x t , c k )based on the current state, x t . Calculate the total hazard, h ( x t , c ) = (cid:80) vk =1 h k ( x t , c k ) . Simulate the time to the next event, τ ∼ Exp ( h ( x t , c )) and set t = t + τ . Simulate the reaction index, j , with probabilities h k ( x t , c k ) /h ( x t , c ) , i = 1 , . . . , v . Update x t according to reaction j . If simulation time is exceeded, stop, otherwise return to step 2.where we let the approximate mean and variance ψ t ( c ) = ˆ E ( X t ) and Σ t ( c ) = (cid:98) V ar ( X t )depend explicitly on the rate constants c . Thus the approximate density at a particulartime point isˆ p ( x t ) = 1(2 π ) u/ | Σ t ( c ) | / exp (cid:20)

12 ( x t − ψ t ( c )) (cid:48) [Σ t ( c )] − ( x t − ψ t ( c )) (cid:21) . It remains that we can choose appropriate forms for ψ t ( c ) and Σ t ( c ). We consider tworelated approaches, namely, moment closure and the linear noise approximation (LNA).We give a brief, informal description of these techniques in the sequel, and refer thereader to van Kampen [2007] and Wilkinson [2012] for further discussion. Here, we approximate the moment equations of the system as a set of ordinary diﬀerentialequations (ODEs). These equations then provide estimates of the mean and variancesof individual chemical species.To extract the moment equations using the moment closure assumption we ﬁrst deﬁnethe moment generating function (indexed by θ ) as M ( θ ; t ) = (cid:88) x t p ( x t ) e θx t . (4)The moments, E [ x nt ], where x nt = ( x n ,t , . . . , x n u u,t ) (cid:48) , of the joint probability distributioncan be found by taking n th order derivatives of the moment generating function withrespect to θ = ( θ , . . . , θ u ) (cid:48) . The ﬁrst moment is the mean and the second moment canbe used to obtain the variance.On multiplying the chemical master equation (2) by e θx t and summing over x t gives ∂M ( θ ; t ) ∂t = (cid:88) x t e θx t v (cid:88) k =1 p ( x t − s k ) h k ( x t − s k , c k ) − p ( x t ) h k ( x t , c k ) . (5)The time evolution of the mean concentration of species X i can be obtained by taking theﬁrst derivative of equation (5) with respect to θ i and then setting θ to zero. Diﬀerentiating5quation (5) twice with respect to θ i yields E [ x i,t ], from which we can obtain the variance.Similarly, diﬀerentiating with respect to θ i θ j gives E [ x i,t x j,t ].Following this process, we can obtain an ordinary diﬀerential equation (ODE) for anymoment of interest. However when we have non-linear dynamics, the equation for the i th moment generally depends on the the ( i + 1) th moment equation, i.e. the ODE forthe mean contains a term depending on the second order moment. To circumvent thisproblem, we need to close the system, for example, by assuming an underlying Gaussiandistribution. The mean and variance in (3), which we denote by ψ mt ( c ) and Σ mt ( c ) inthis context, are then easily obtainedGrima [2012] [see also Singh and Hespanha, 2007, Smadbeck and Kaznessis, 2013]shows that increased accuracy of lower-order moment estimation can be obtained byusing a higher-order closure scheme. However even though we can estimate higher ordermoments, it is not clear how these estimates can be routinely utilised. Hence a popularclosure choice is to assume normality, resulting in coupled equations for only the meanand variance. This particular closure is also known as the two moment approximation(2MA). The linear noise approximation can be formed by ﬁrst constructing the chemical Langevinequation (CLE). In an inﬁnitesimal time interval ( t, t + dt ], the reaction hazards willremain constant almost surely. This allows us to treat the occurrence of reaction eventsas the occurrence of events from a Poisson process with independent realisations foreach reaction type. Writing dR t for the v -vector number of reaction events of eachtype in the time increment, it then follows that E( dR t ) = h ( X t , c ) dt and Var( dR t ) =diag { h ( X t , c ) } dt . Using dX t = sdR t and matching E( dX t ) and Var( dX t ) with the driftand diﬀusion coeﬃcients of an Itˆo stochastic diﬀerential equation (SDE) gives dX t = sh ( X t , c ) dt + (cid:112) s diag { h ( X t , c ) } s (cid:48) dW t (6)where dW t is the u -dimensional Brownian motion increment. Equation (6) is commonlyreferred to as the chemical Langevin equation (CLE).The LNA can now be derived from the CLE as follows. We replace the hazard functionin equation (6) with the rescaled form Ω f ( X t / Ω , c ) where Ω is the cell volume. Thisresults in dX t = Ω sf ( X t / Ω , c ) dt + (cid:112) Ω s diag { f ( X t / Ω , c ) } s (cid:48) dW t . (7)Following van Kampen [2007], we write the solution X t of the CLE as a deterministicprocess z t plus a residual stochastic process, X t = Ω z t + √ Ω M t . (8)Then, a Taylor expansion of the rate function around z t gives f ( z t + M t / √ Ω , c ) = f ( z t , c ) + 1 √ Ω F t M t + O (Ω − ) , (9)6here F t is the v × u Jacobian matrix with ( i, j ) th element ∂f i ( z t , c ) /∂z j,t and z j,t isthe j th component of z t . Note that we suppress the dependence of F t on z t and c forsimplicity. Substituting (8) and (9) into equation (7) and collecting terms of O (1) and O (1 / √ Ω) give the ODE satisﬁed by z t , and SDE satisﬁed by M t respectively, as dz t = s f ( z t , c ) dt, (10) dM t = s F t M t dt + (cid:112) s diag { f ( z t , c ) } s (cid:48) dW t . (11)Equations (8), (10) and (11) give the linear noise approximation of the CLE and in turn,an approximation of the Markov jump process model.For ﬁxed or Gaussian initial conditions, the stochastic diﬀerential equation in (11)can be solved explicitly to give ( M t | c ) ∼ N ( m t , V t ) where m t and V t satisfy the coupleddeterministic system of ordinary diﬀerential equations dm t dt = sF t m t , (12) dV t dt = V t F (cid:48) t s (cid:48) + s diag { h ( z t , c ) } s (cid:48) + s (cid:48) F t V t . (13)Hence, the approximating distribution of X t is as (3) with ψ lt ( c ) = Ω z t + √ Ω m t , Σ lt ( c ) = Ω V t . (14)In situations where the ODE satisﬁed by z t is initialised with z = x so that m = 0,we see that m t = 0 for all t and ψ lt ( c ) = Ω z t . Note further that Ω plays no role in theevolution equations (10) and (13). Therefore, in the examples in section 4, we assume aunit volume (Ω = 1) for simplicity. When model building, we usually want to investigate many diﬀerent parameter com-binations. Similarly when inferring parameters, the data available is usually limitedand prior information on the plausible parameter values is sparse. Therefore, parameterinference and model exploration usually follows a combination of parameter scans,and/or exploring the parameter space using eﬃcient inference algorithms.Since the parameter space to search will be large, it would be computationallyunfeasible to numerically assess the approximate simulator at all values. In particular,since an approximate algorithm is being utilised, this implies that simulating exactrealisations may be computationally intensive. Thus the parameter space must beexplored eﬃciently.One approach to explore the parameter space is random sampling, that is, we sampleuniformly in the parameter space. However, McKay et al. [1979] showed that Latinhypercube sampling (LHS) gave a signiﬁcant improvement over simple random samplingwhen exploring large spaces. Morris and Mitchell [1995] improved the original LHSdesign with the maximin design, in which the distance between points in the hypercube7igure 1: A two–dimensional Latin hypercube design, with n d = 50 points.is maximised. Moreover, Latin hypercube sampling of the parameter space lends itselfto an embarrassingly parallel mode of computation. Naturally, in scenarios that do notrequire a covering of the whole parameter space, other methods may be preferred. Forexample, if performing Bayesian inference via Markov chain Monte Carlo (MCMC) andfocusing on regions of high posterior density, then we may choose to use the output ofthe MCMC scheme.Figure 1 illustrates a two–dimensional design over parameters ( c , c ), with n d = 50points. We denote the n d points in the Latin hypercube as γ = ( γ , . . . , γ n d ) . Hence, each point γ i is a length- v (column) vector of parameter values, that is γ i =( c ,i , . . . , c v,i ) (cid:48) . A feature of these space ﬁlling designs, is that the marginal parameterdistributions have a uniform distribution, thereby giving good coverage in each dimension.Our general strategy is to compare the moment closure/linear noise approximationto a single realisation simulated exactly (using Algorithm 1) from the Markov jumpprocess, at each of the n d points in the design. We refer to Algorithm 1 as the exactsimulator. For each design point γ i let x ∗ ( γ i ) = ( x ∗ ( γ i ) , . . . , x ∗ u ( γ i )) (cid:48) denote a singlerealisation from the exact simulator at a particular time point, with dependence ontime, and the initial conditions used to produce the realisation, suppressed for ease ofnotation. In the following sections, we describe simple diagnostics that can be assessedby comparing the observed diagnostic at x ∗ ( γ i ) with the reference distribution of thediagnostic induced by the approximations described in (2.2). One way of assessing the accuracy of a Gaussian based approximation is to calculateindividual prediction errors. These are obtained by calculating the diﬀerence between8he exact simulator and the mean of the linear noise (or two moment) approximation,that is e i,j = x ∗ j ( γ i ) − ψ j ( γ i ) (15)for each point i = 1 , . . . , n d and species j = 1 , . . . , u . Note that ψ j ( γ i ) denotes the j thcomponent of the mean in (3) after omitting dependency on time t . Plainly, a moreappropriate quantity to work with is the standardised prediction error e ∗ i,j = x ∗ j ( γ i ) − ψ j ( γ i ) (cid:112) Σ jj ( γ i ) . (16)If x ∗ ( γ i ) is replaced with a draw from either the two moment or linear noise approximation,then the standard prediction errors can be seen as draws from a standard normaldistribution. Hence, large standardised individual errors, with absolute values largerthan say two, indicate a potential discrepancy between the exact and approximatesimulators. Of course, single, isolated values are possible, and so further investigationcan be performed by obtaining more simulator runs in the parameter vicinity.Since the reference distribution of the standardised prediction errors is normal, we canuse other standard techniques for assessing the modelling assumptions that underpinboth approximate simulators. For example, quantile-quantile (q-q) plots provide anatural graphical diagnostic for assessing normality, with a reasonable ﬁt indicated bypoints close to a 45-degree line through the origin. We may expect the output of theexact simulator to be heavier tailed than a Gaussian, in which case points in the q-qplot will cluster around a line with a slope greater than one. Plotting errors againstparameter values may also be useful in identifying regions of parameter space thatexhibit large discrepancies.We note that at each point on the Latin hypercube, it is possible to draw n ex realisations from an exact simulator (giving a total of N = n d × n ex exact simulations),and use a formal hypothesis test in the spirit of Jenkinson and Goutsias [2013b]. However,there are a number of potential drawbacks with this approach. First, the computationalcost may be prohibitively large. For a ﬁxed computational budget of N simulations,either n ex would be prohibitively small, which would adversely aﬀect the power of thetest, or we would reduce n d and not explore the parameter space. Second, for both theLNA and 2MA schemes, if the model contains a second order reaction, we can proveanalytically that the mean and variance are not equal to the true value, so a hypothesistest is not needed. Furthermore, any non-signiﬁcant test must be spurious. Third, thenormal assumption is also clearly incorrect since the state space is discrete. Therefore,we focus on an assessment of whether the approximation is “good enough” over a largeparameter space. Another straightforward method for assessing ﬁt is to construct a 100 α % conﬁdenceinterval for x ∗ j ( γ i ) using the mean ψ ( · ) and variance Σ( · ) associated with the approximatesimulator under assessment. We denote a particular conﬁdence interval at design point9 , for species j , as CI i,j ( α ). The proportion of simulated values that land within theconﬁdence region is given by D CIj = 1 n d n d (cid:88) i =1 x ∗ j ( γ i ) ∈ CI i,j ( α )] , (17)where 1[ · ] is the indicator function. We can assess ﬁt as the value of D CIj shouldbe approximately equal to α . Additionally, plotting the conﬁdence regions againstparameter values can highlight any particular systematic deviations. Recently Grima [2012] explored the link between the two moment and linear noiseapproximations. Essentially, the two approximations are very similar, except that themean equations in the LNA do not depend on the covariances. This would suggest thatif the two approximations gave appreciably diﬀerent estimates for the ﬁrst two moments,further investigation is required.We deﬁne the standardised diﬀerence between the two approximations as D LMj ( γ i ) = ψ lj ( γ i ) − ψ mj ( γ i ) (cid:113) Σ ljj ( γ i ) . (18)Note again that for notational convenience, the time subscript t has been omittedfrom the expression. Large diﬀerences of D LMj should be carefully investigated. Thisdiagnostic measure has the advantage of avoiding (possibly expensive) exact simulation.However, when the two approximations give similar results, it does not necessarily followthat both approximations are correct. For example, Schnoerr et al. [2014] highlightedan oscillating system where the LNA and 2MA schemes were in agreement, but weresigniﬁcantly diﬀerent from the solution to the underlying chemical master equation.

Here, we demonstrate the diagnostic tools in three examples. Diagnostics based on thelinear noise approximation are constructed for two reaction networks that are known toexhibit interesting non-linear dynamics. In the ﬁnal example, we consider the prokaryoticauto regulatory gene network analysed by Golightly and Wilkinson [2008] and Milneret al. [2013]. We focus on the moment closure approximation and construct diagnosticsto assess approximate simulator ﬁt both a priori and a posteriori . Interactive versionsof all graphics can be found at https://bookdown.org/csgillespie/diagnostics/ .1 Schl¨ogl system The Schl¨ogl model is a well known test system that exhibits bi-modal and non-linearcharacteristics at certain parameter combinations. The system contains four reactions R : 2 X + X c −−→ X R : 3 X c −−→ X + X R : X c −−→ X R : X c −−→ X describing the evolution of three chemical species, X , X , and X and assumes massaction kinetics. In this example, we concentrate on species X . Where the distribution of X is bi-modal, the linear noise approximation would clearly be inappropriate. Howeverfor large models, it isn’t necessary clear if (or where) a system would have bi-modalregions. Hence, the purpose of this example is to illustrate how problematic regions maybe detected.The parameters ( c , c ) (cid:48) were ﬁxed at (3 × − , − ) (cid:48) and the initial conditionsassumed constant at x = (250 , , × ) (cid:48) . Suppose that interest lies in the accuracy of the linear noise approximation at time-point t = 5. Further, consider a parameter space deﬁned by the regions c = (10 − , − ) and c = (10 − ,

10) (on the log scale). Figure 2a shows the region in parameter space thatleads to a bi-modal distribution of X . The plot was obtained by ﬁnely discretising theparameter space (to give a 1000 × scale) at each parameter value. For systems of realistic size andcomplexity, this approach will be computationally prohibitive.We therefore generated a Latin hypercube with n d = 10 ,

000 points. The twodimensional design space is given in Figure 2a. The standardised prediction errorsplotted against parameter c are shown in Figure 2b. The locally smoothed mean value(shown in blue), is close to zero. However there are several large errors, in particular, e ∗ (cid:39) γ (see Figure 2d). The LNA meansolution is also shown in red. It is clear that at this particular choice of parametervalues, the Schl¨ogl system has a bi-modal distribution and the LNA is inappropriate inthis region of parameter space. Therefore, with relatively few design points, we are ableto detect regions of parameter space that lead to signiﬁcant discrepancies between theexact and approximate simulators. Naturally, care must be taken in the choice of n d andthis will typically be dictated by computational budget. We ﬁnd that for this example,reducing n d to 1000 results in only a single value in the Latin hypercube design with anabsolute prediction error greater than 2.As discussed in section 3.1, rather than generate a single exact simulation at each ofthe n d points, we could simulate N times, where N = n d × n ex , to give n ex replicatesat each of the design points, allowing comparison of the simulator output via a formalhypothesis test. This is similar to the example in Jenkinson and Goutsias [2013b], wherethe authors set n ex = 350. We note that a computational budget allowing N = 10 , scale), with n d = 10 , .

9% regions indicated by greylines. (d) Fifty stochastic simulations, where c = (3 × − , − , . , . The predator prey system developed by Lotka [1925] and Volterra [1926], describesthe time evolution of two species, X and X . This system has two species and threereactions R : X c −−→ X R : X + X c −−→ X R : X c −−→ ∅ . Although relatively simple, this system exhibits interesting auto regulatory behaviourand has been used numerous times to test inference algorithms; see, for example, Boys12igure 3: Lotka-Volterra predator prey system. All simulations used initial conditions x = (100 , (cid:48) . (a) Latin hypercube design (with n d = 100) on the log scale. At each point on the hypercube the D LM diagnostic was calculated.Values where | D LMi, | > c = (10 − , . , . (cid:48) .et al. [2008], Opper and Sanguinetti [2008], White et al. [2013]. In particular, the linearnoise and two moment approximations have been used for parameter inference [Milneret al., 2013, Golightly et al., 2015].To assess the linear noise approximation, we generated n d = 100 points from atwo-dimensional Latin hypercube, over the regions c = (10 − , ) and c = (10 − , )on the log scale. These regions correspond to an inference situation where we areusing vague priors. We set c = 0 . x = (100 , (cid:48) with amaximum simulation time of t = 30. Figure 3a shows the Latin hypercube design. Thediagnostic of Section 3.3 was computed at each design point. Values where | D LMi, | > c or c , theLNA and 2MA approximations disagree. Moreover, we see that these points coincidewith a high probability of prey extinction by time 30 (see also Figure 3b, showing ﬁftyrealisations from the exact simulator at a typical discrepant parameter value). Thisresult is perhaps unsurprising given the time-course behaviour of the Markov jumpprocess representation of the Lotka-Volterra system. The system eventually reaches oneof two states: if X dies out then the system will run to (0 ,

0) (reactions 1 and 2 willnever again occur). If X dies out the system will go towards ( ∞ ,

0) (reactions 2 and3 will never again occur). The LNA fails to capture this behaviour. For example, theLNA mean is a perfectly repeating oscillation, carrying on indeﬁnitely. As expected,increasing t leads to a higher proportion of the parameter space with signiﬁcantly largeprediction errors (results not reported). 13 .3 Prokaryotic auto regulatory gene network A more realistic example is the prokaryotic auto regulation system. This larger modelcontains six species and twelve reactions. In this network a protein I coded for by agene i represses its own transcription and also the transcription of another gene g bybinding to a regulatory region upstream of the gene. This is described by the reactions R : I + i → I · i, R : I · i → I + i,R : I + g → I · g, R : I · g → I + g. The transcription of i and g and the translation of mRNA r i and r g are represented by R : i → i + r i , R : r i → r i + I,R : g → g + r g , R : r g → r g + G. We also have mRNA degradation R : r i → ∅ , R : r g → ∅ , and protein degradation R : I → ∅ , R : G → ∅ . Each reaction i has a stochastic rate constant c i . There are two conservation laws inthe model I · i + i = K , I · g + g = K , where K and K are conservation constants. If K and K are known, then we cansimplify the model using the conservation laws to remove I · i and I · g . This simpliﬁcationreduces the model to six species x = ( I, G, i, g, r i , r g ) (cid:48) . The reaction hazards for R and R are h ( x, c ) = c Ii and h ( x, c ) = c I · i = c ( K − i )respectively. Hazards for R and R are calculated similarly. The remaining hazards arefor ﬁrst order reactions.This model has been used to test parameter inference schemes by Golightly andWilkinson [2008] and Milner et al. [2013]. In this example, we will explore the momentclosure approach used by Milner et al. [2013].The stochastic realisation that Milner et al. [2013] based their parameter inferenceon is given in Figure 4. Many of the chemical species have population sizes less thantwenty. However, the population of species G has a population greater than 65 , t = 0 , , . . . , We use a data set of ﬁfty observations at (unit) discrete time points of the simu-lated process (see Figure 4 for the trace of the realisation). The true parametervalues for ( c , c , . . . , c ) that produced the data set were (0 . , . , . , . , . , . , . , . , . , . , . , . i has at most two copiesand only takes values 0, 1 or 2.Only vague prior knowledge was assumed about parameter values, with Uniform U ( − ,

1) priors for each log( c i ) for i = 1 , . . . ,

12 and U ( − , −

6) on log( c ). The valuesof K and K were assumed known and set at two and ten respectively. An n d = 10 ,

000 point twelve dimensional Latin hypercube was created on the log spaceover the parameter prior regions. At each point on the hypercube, a time point from15igure 5: Predictive error plots for I (top row) and i (bottom row) based on the priordistribution. A total of n d = 10 ,

000 points were sampled from a thirteendimensional Latin hypercube (twelve parameters plus the time dimensional).The blue lines on ﬁgures (a), (b), (d) and (e) represent the standard normaldistribution. The dashed lines in ﬁgures (c) and (f) represent the 99%, 99.9%and 99.99% regions in the standard normal distribution.the realisation in Figure 4 was selected to initialise the exact simulator and the momentclosure approximation. Each simulator was then run for a single time point and thestandardised prediction error was calculated.Figure 5 (a)–(c) gives the diagnostic plots for species I . Although the population levelsof I are relatively small, the population size varies between 3 and 13, the associateddiagnostic plots still look reasonable. The diagnostic plots for species i are given inFigure 5 (d)–(f). This species only takes values 0, 1, and 2. As would be expected, thediagnostic plots show clear deviations from the normality assumptions. In particular,when c << .

1, we obtain a number of very large standardised prediction errors. Aswith the Schl¨ogl system, it would be advisable to investigate these problematic pointsmore carefully. We note that for the data set in Figure 4, the marginal posterior densityfor c has negligible mass in this region of parameter space (the true value of c is 0.82). A further investigation of the appropriateness of the moment approximation can bemade a posteriori . Since the parameters in the posterior distribution were in some caseshighly correlated, we sampled n d = 10 ,

000 points from this posterior.16igure 6: Predictive error plots for I (top row) and i (bottom row), based on theposterior distribution. A total of n d = 10 ,

000 points were sampled fromthe posterior distribution. The blue lines on ﬁgures (a), (b), (d) and (e)represent the standard normal distribution. The dashed lines in ﬁgures (c)and (f) represent the 99%, 99.9% and 99.99% regions in the standard normaldistribution.Again, the diagnostic plots for species I (Figure 6 (a)–(c)) suggest that the normalityassumption and the accuracy of the mean and variances of the moment closure approx-imation appear reasonable. The diagnostic plots for low level species i have substantiallyimproved (see Figure 6 (d)–(e)), although we observe extreme standardised errors inFigure 6 (f). Of course, since i can only take values 0, 1, and 2, the prediction errorsare not normally distributed (see Figure 6 (d)). Although the moment closure approachfails to adequately match the Markov jump process in all regions of parameter space a priori , in regions of high posterior density, it does appear to provide a satisfactoryalternative. Analysing stochastic kinetic models of realistic size and complexity is a challenging prob-lem. For example, whilst it is possible, in principle, to perform exact (simulation-based)inference for the Markov jump process (MJP) representation [Boys et al., 2008, Golightlyand Wilkinson, 2011, Owen et al., 2015], existing approaches are computationally intens-ive and have ostensibly focused on toy examples with relatively few numbers of species17nd reactions. Replacing the exact MJP simulator with a cheap approximation and usingthis for model exploration/inference is an appealing alternative approach. Gaussianapproximations that ignore discreteness but not stochasticity, such as the linear noiseapproximation (LNA) and moment closure approaches considered here, are particularlyattractive due to their tractability. While this assumption can make inference easier, itis essential to assess the appropriateness of the Gaussian approximation. It is apparentfrom the literature that such an assessment rarely takes place.In this paper we have presented a general, easy-to-use, framework that allows model-lers to determine whether a given Gaussian approximation is suitable for their model.Following the approach of Bastos and O’Hagan [2009], we have examined simple numer-ical diagnostics, by constructing appropriate functions of the exact simulator output.Comparing observed values of the diagnostic (for a particular parameter value) to thedistribution induced by the approximation gives an indication of whether or not theapproximation can adequately represent the MJP. By using eﬃcient space ﬁlling designsto explore the parameter space, we can assess an approximate simulator across a largeregion. In particular, since each point in the Latin hypercube design can be simulatedindependently, we can use cloud computing to explore vast regions of the parameterspace.We applied our approach to three examples in which the underlying Markov jumpprocess exhibits interesting non-linear dynamics. For the Schl¨ogl system (Section 4.1),our approach was able to detect a region of bi-modality using relatively few designpoints. In the Lotka-Volterra example (Section 4.2), a comparison of the linear noiseapproximation and moment closure approach was able to identify regions of the parameterspace that lead to prey extinction. Finally, for the prokaryotic auto regulatory genenetwork (Section 4.3), we considered the synthetic data set of Milner et al. [2013]and compared the moment closure approach with the MJP over parameter regionsdetermined both a priori and a posteriori . We found that in regions of high posteriordensity, the approximation does appear to provide a satisfactory alternative to the MJP,despite the inherent discreteness of the observed data.

Computing details

All simulations were performed on a machine with 16GB of RAM and with an Intelquad-core CPU. The operating system used was Ubuntu 12.04. Simulations for theLotka-Volterra model and Schl¨ogl system were performed using R (version 3.3.1), viathe issb package (version 0.13.3) (R Core Team and R Development Core Team [2013],Golightly and Gillespie [2013]. The Latin hypercube was generated using the lhs package (version 0.13) [Carnell, 2012]. The graphics were created using the ggplot2

Rpackage (version 2.1.0) [Wickham, 2009]. The Prokaryotic auto regulatory gene networkcode used a combination of C (from the Milner et al. [2013] paper) and R code.

Acknowledgements:

We thank the three anonymous referees for constructive com-ments that have improved the paper. 18 eferences

L. S. Bastos and A. O’Hagan. Diagnostics for Gaussian Process Emulators.

Technomet-rics , 51(4):425–438, Nov. 2009. ISSN 0040-1706. doi: 10.1198/TECH.2009.08019.M. A. Beaumont, W. Zhang, and D. J. Balding. Approximate Bayesian Computation inPopulation Genetics.

Genetics , 162(4):2025–2035, 2002.R. J. Boys, D. J. Wilkinson, and T. B. L. Kirkwood. Bayesian inference for a discretelyobserved stochastic kinetic model.

Statistics and Computing , 18(2):125–135, 2008.ISSN 0960-3174. doi: 10.1007/s11222-007-9043-x.Y. Cao and L. R. Petzold. Accuracy limitations and the measurement of errors inthe stochastic simulation of chemically reacting systems.

Journal of ComputationalPhysics , 212(1):6–24, Feb. 2006. ISSN 00219991. doi: 10.1016/j.jcp.2005.06.012.Y. Cao, H. Li, and L. R. Petzold. Eﬃcient formulation of the stochastic simulationalgorithm for chemically reacting systems.

The Journal of Chemical Physics , 121(9):4059–67, Sept. 2004. ISSN 0021-9606. doi: 10.1063/1.1778376.R. Carnell. lhs: Latin Hypercube Samples , 2012. URL http://cran.r-project.org/package=lhs .J. Elf and M. Ehrenberg. Fast evolution of ﬂuctuations in biochemical networks withthe linear noise approximation.

Genome Res. , 13(11):2475–2484, 2003.C. W. Gardiner.

Handbook of stochastic methods for physics, chemistry, and thenatural sciences , volume 13 of

Springer series in synergetics . Springer-Verlag, BerlinHeidelberg New York, 2 edition, 1985.M. A. Gibson and J. Bruck. Eﬃcient Exact Stochastic Simulation of Chemical Systemswith Many Species and Many Channels.

The Journal of Physical Chemistry A , 104(9):1876–1889, Mar. 2000. ISSN 1089-5639. doi: 10.1021/jp993732q.C. S. Gillespie. Moment-closure approximations for mass-action models.

IET SystemsBiology , 3(1):52–8, 2009. ISSN 1751-8849. doi: 10.1049/iet-syb:20070031.D. T. Gillespie. A general method for numerically simulating the stochastic timeevolution of coupled chemical reactions.

Journal of Computational Physics , 22(4):403–434, Dec. 1976. ISSN 00219991. doi: 10.1016/0021-9991(76)90041-3.D. T. Gillespie. A rigorous derivation of the chemical master equation.

PhysicaA: Statistical Mechanics and its Applications , 188(1-3):404–425, Sept. 1992. ISSN03784371. doi: 10.1016/0378-4371(92)90283-V.D. T. Gillespie. The chemical Langevin equation.

The Journal of Chemical Physics , 113(1):297–306, 2000. 19. Golightly and C. S. Gillespie. Simulation of stochastic kinetic models. In M. V.Schneider, editor,

Methods in Molecular Biology , volume 1021, pages 169–87. HumanaPress, Jan. 2013. doi: 10.1007/978-1-62703-450-0 \ Computational Statistics & Data Analysis , 52(3):1674–1693, Jan. 2008. ISSN 01679473. doi: 10.1016/j.csda.2007.05.019.A. Golightly and D. J. Wilkinson. Bayesian parameter inference for stochastic biochemicalnetwork models using particle MCMC.

Interface Focus , pages 807–820, 2011.A. Golightly, D. A. Henderson, and C. Sherlock. Delayed acceptance particle MCMCfor exact inference in stochastic kinetic models.

Statistics and Computing , May 2015.ISSN 09603174. doi: 10.1007/s11222-014-9469-x.R. Grima. A study of the accuracy of moment-closure approximations for stochasticchemical kinetics.

The Journal of Chemical Physics , 136(15):154105, Apr. 2012. ISSN1089-7690. doi: 10.1063/1.3702848.B. Ingalls. Sensitivity analysis: from model parameters to system behaviour. InO. Wolkenhauer, P. Wellstead, and C. Kwang-Hyun, editors,

Essays in Biochemistry- Systems Biology , pages 177–193. Portland Press, London, 2008.G. Jenkinson and J. Goutsias. Statistically testing the validity of analytical andcomputational approximations to the chemical master equation.

The Journal ofChemical Physics , 138(20):204108, 2013a. ISSN 1089-7690. doi: 10.1063/1.4807390.G. Jenkinson and J. Goutsias. Statistical validation of parametric approximationsto the master equation. In , pages 1721–1725. IEEE, Nov. 2013b. ISBN 978-1-4799-2390-8. doi:10.1109/ACSSC.2013.6810595.A. Kowald and T. Kirkwood. A network theory of ageing: the interactions of defectivemitochondria, aberrant proteins, free radicals and scavengers in the ageing process.

Mutation Research/DNAging , 316(5-6):209–236, May 1996. ISSN 09218734. doi:10.1016/S0921-8734(96)90005-3.T. G. Kurtz. Solutions of ordinary diﬀerential equations as limits of pure jump markovprocesses.

J. Appl. Probab.. , 7:49–58, 1970.A. J. Lotka.

Elements of Physical Biology . Baltimore: Williams and Wilkins, 1925.J. M. McCollum, G. D. Peterson, C. D. Cox, M. L. Simpson, and N. F. Samatova. Thesorting direct method for stochastic simulation of biochemical systems with varyingreaction execution behavior.

Computational Biology and Chemistry , 30(1):39–49, Feb.2006. ISSN 1476-9271. doi: 10.1016/j.compbiolchem.2005.10.007.20. D. McKay, R. J. Beckman, and W. J. Conover. Comparison of Three Methods forSelecting Values of Input Variables in the Analysis of Output from a Computer Code.

Technometrics , 21(2):239–245, May 1979. ISSN 0040-1706. doi: 10.1080/00401706.1979.10489755.P. Milner, C. S. Gillespie, and D. J. Wilkinson. Moment closure based parameterinference of stochastic kinetic models.

Statistics and Computing , 23(2):287–295, Jan.2013. ISSN 09603174. doi: 10.1007/s11222-011-9310-8.M. D. Morris and T. J. Mitchell. Exploratory designs for computational experiments.

Journal of Statistical Planning and Inference , 43(3):381–402, Feb. 1995. ISSN 03783758.doi: 10.1016/0378-3758(94)00035-T.M. Opper and G. Sanguinetti. Variational inference for Markov jump processes. In J. C.Platt, D. Koller, Y. Singer, and S. T. Roweis, editors,

Advances in Neural InformationProcessing Systems 20 , pages 1105–1112. Curran Associates, Inc., 2008.J. Owen, D. J. Wilkinson, and C. S. Gillespie. Likelihood free inference for Markovprocesses: a comparison.

Statistical Applications in Genetics and Molecular Biology ,14(2):189–209, 2015.J. Pahle. Biochemical simulations: stochastic, approximate stochastic and hybridapproaches.

Brieﬁngs in Bioinformatics , 10(1):53–64, Jan. 2009. ISSN 1477-4054. doi:10.1093/bib/bbn050.R Core Team and R Development Core Team. R: A Language and Environment forStatistical Computing, 2013. URL .H. Salis and Y. Kaznessis. Accurate hybrid stochastic simulation of a system of coupledchemical or biochemical reactions.

The Journal of Chemical Physics , 122(5):54103,Feb. 2005. ISSN 0021-9606. doi: 10.1063/1.1835951.D. Schnoerr, G. Sanguinetti, and R. Grima. Validity conditions for moment closureapproximations in stochastic chemical kinetics.

The Journal of Chemical Physics , 141(8):084103, Aug. 2014. ISSN 0021-9606. doi: 10.1063/1.4892838.C. Sherlock, A. Golightly, and C. S. Gillespie. Bayesian inference for hybrid discrete-continuous systems biology models.

Inverse Problems , 30:114005, 2014.A. Singh and J. P. Hespanha. A derivative matching approach to moment closure forthe stochastic logistic model.

Bulletin of Mathematical Biology , 69:1909–1925, 2007.S. A. Sisson, Y. Fan, and M. M. Tanaka. Sequential Monte Carlo without likelihoods.

Proceedings of the National Academy of Sciences , 104(6):1760–1765, 2007. doi: 10.1073/pnas.0607208104.P. Smadbeck and Y. N. Kaznessis. A closure scheme for chemical master equations.

Proceedings of the National Academy of Sciences , 110:14261–14265, 2013.21. Toni, D. Welch, N. Strelkowa, A. Ipsen, and M. P. H. Stumpf. ApproximateBayesian computation scheme for parameter inference and model selection in dynamicalsystems.

Journal of The Royal Society Interface , 6(31):187–202, Feb. 2009. doi:10.1098/rsif.2008.0172.N. G. van Kampen.

Stochastic Processes in Physics and Chemistry . North Holland, 3edition, 2007.V. Volterra. Fluctuations in the abundance of a species considered mathematically.

Nature , 118:558–660, 1926.S. R. White, T. Kypraios, and S. P. Preston. Piecewise Approximate Bayesian Com-putation: fast inference for discretely observed Markov models using a factorisedposterior distribution.

Statistics and Computing , Nov. 2013. ISSN 0960-3174. doi:10.1007/s11222-013-9432-2.H. Wickham. ggplot2: Elegant Graphics for Data Analysis . Springer, New York, 2009.ISBN 978-0-387-98140-6.D. J. Wilkinson.