[PDF] Estimation of parameter sensitivities for stochastic reaction networks using tau-leap simulations

Abstract

We consider the important problem of estimating parameter sensitivities for stochastic models of reaction networks that describe the dynamics as a continuous-time Markov process over a discrete lattice. These sensitivity values are useful for understanding network properties, validating their design and identifying the pivotal model parameters. Many methods for sensitivity estimation have been developed, but their computational feasibility suffers from the critical bottleneck of requiring time-consuming Monte Carlo simulations of the exact reaction dynamics. To circumvent this problem one needs to devise methods that speed up the computations while suffering acceptable and quantifiable loss of accuracy. We develop such a method by first deriving a novel integral representation of parameter sensitivity and then demonstrating that this integral may be approximated by any convergent tau-leap method. Our method is easy to implement, works with any tau-leap simulation scheme and its accuracy is proved to be similar to that of the underlying tau-leap scheme. We demonstrate the efficiency of our methods through numerical examples. We also compare our method with the tau-leap versions of certain finite-difference schemes that are commonly used for sensitivity estimations.

Full PDF

EEstimation of parameter sensitivities for stochastic reactionnetworks using tau-leap simulations

Ankit Gupta ∗ , Muruhan Rathinam † , and Mustafa Khammash ‡ Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, 4058 Basel, Switzerland. Department of Mathematics and Statistics, University of Maryland Baltimore County, 1000 Hilltop Circle,Baltimore, MD 21250, U.S.A.

July 15, 2018

Abstract

We consider the important problem of estimating parameter sensitivities for stochastic models ofreaction networks that describe the dynamics as a continuous-time Markov process over a discrete lat-tice. These sensitivity values are useful for understanding network properties, validating their designand identifying the pivotal model parameters. Many methods for sensitivity estimation have been devel-oped, but their computational feasibility suﬀers from the critical bottleneck of requiring time-consumingMonte Carlo simulations of the exact reaction dynamics. To circumvent this problem one needs to devisemethods that speed up the computations while suﬀering acceptable and quantiﬁable loss of accuracy.We develop such a method by ﬁrst deriving a novel integral representation of parameter sensitivity andthen demonstrating that this integral may be approximated by any convergent tau-leap method. Ourmethod is easy to implement, works with any tau-leap simulation scheme and its accuracy is proved to besimilar to that of the underlying tau-leap scheme. We demonstrate the eﬃciency of our methods throughnumerical examples. We also compare our method with the tau-leap versions of certain ﬁnite-diﬀerenceschemes that are commonly used for sensitivity estimations.

Keywords: parameter sensitivity; reaction networks; Markov process; tau-leap simulations

Mathematical Subject Classiﬁcation (2010):

The study of chemical reaction networks is an essential component of the emerging ﬁelds of Systems andSynthetic Biology [1, 44, 17]. Traditionally chemical reaction networks were modeled in the deterministicsetting, where the dynamics is represented by a set of ordinary diﬀerential equations (ODEs) or partialdiﬀerential equations (PDEs). In the study of intracellular chemical reactions, some chemical species arepresent in low copy numbers. Since the behavior of individual molecules is best described by a stochasticprocess, in the low molecular copy number regime, the copy numbers of the molecular species itself is bettermodeled by a stochastic process than by ODEs [19]. Only in the limit of large molecular copy numbers, oneexpects the deterministic models to be accurate [3]. While our work in this paper is focused on biochemicalreaction networks as primary examples, we emphasize that the mathematical framework of reaction networkscan also be used to describe a wide range of other phenomena in ﬁelds such as Epidemiology [28] and Ecology[8]. Suppose θ is a parameter (like ambient temperature, cell-volume, ATP concentration etc.) that inﬂuencesthe rate of ﬁring of reactions. Let ( X θ ( t )) t ≥ be the θ -dependent Markov process representing the reaction ∗ [email protected] † [email protected] ‡ [email protected] a r X i v : . [ m a t h . P R ] J a n ynamics, and suppose that for some real-valued function f and observation time T , our output of interestis f ( X θ ( T )). This output is a random variable and we are interested in determining the sensitivity of itsexpectation E ( f ( X θ ( T ))) w.r.t. inﬁnitesimal changes in the parameter θ . We deﬁne this sensitivity value,denoted by S θ ( f, T ), as the partial derivative S θ ( f, T ) := ∂∂θ E ( f ( X θ ( T ))) . (1.1)Determining these parametric-sensitivity values are useful in many applications, such as, understandingnetwork design and its robustness properties [42], identifying critical reaction components, inferring modelparameters [16] and ﬁne-tuning a system’s behavior [15].Generally the sensitivities of the form (1.1) cannot be directly evaluated, but instead, they need to beestimated with Monte Carlo simulations of the dynamics ( X θ ( t )) t ≥ . Many methods have been developedfor this task [23, 33, 40, 41, 2, 25, 26], but they all rely on exact simulations of ( X θ ( t )) t ≥ that can beperformed using schemes such as Gillespie’s stochastic simulation algorithm (SSA) [19]. This severely con-strains the computational feasibility of these sensitivity estimation methods because these exact simulationsbecome highly impractical if the rate of occurrence of reactions is high [21], which is typically the case. Themain diﬃculty is that that exact simulation schemes keep track of each reaction event which is very time-consuming. To avoid this problem, tau-leaping methods have been developed that proceed by combiningmany reaction-ﬁrings over small time intervals [20]. Tau-leap methods have been shown to produce goodapproximations of the reaction dynamics, at a small fraction of the computational cost of exact simulations[20, 11, 36, 43, 5, 39, 47, 48, 29, 32]. Their accuracy and stability has also been investigated theoretically inmany papers [38, 31, 6, 35, 37].Our goal in this paper is to develop a method that takes advantage of the computational eﬃciency oftau-leap methods for the purpose of estimating sensitivity values of the form (1.1). Since tau-leap methodsintroduce a bias in the estimation, it is highly desirable to start with an unbiased method for computingsensitivities (instead of biased methods such as the Finite Diﬀerence (FD)) and then replace exact SSAsimulations by a suitable tau-leap method. Having only one form of bias, modulated by the tau-leap stepsize, allows one to control the bias more eﬀectively and also facilitates the design of multilevel strategiesthat eliminate or reduce the estimator bias and enhance its computational eﬃciency [7, 30, 32]. Among theexisting methods in the literature, only the Girsanov Transformation (GT) method [22, 33], the

AuxiliaryPath Algorithm (APA)[25] and the

Poisson Path Algorithm (PPA) [26] are unbiased. Since the GT method ingeneral suﬀers from large variance [26, 2, 25, 40, 41, 45] and the APA/PPA methods are not directly amenableto tau-leap approximation, we develop a variant of the PPA method in which exact SSA simulations arereplaced by tau-leap simulations. Our method, called

Tau Integral Path Algorithm ( τ IPA), works withany underlying tau-leap simulation scheme and it is based on a novel integral representation of parametersensitivity S θ ( f, T ) that we derive in this paper. We provide computational examples that show that using τ IPA we can often trade-oﬀ a small amount of bias for large savings in the overall computational costs forsensitivity estimation. We prove that the bias incurred by τ IPA depends on the step-size in the same way asthe bias of the tau-leap scheme chosen for simulations. Moreover if we substitute the tau-leap simulationsin τ IPA with the exact SSA generated simulations, then we obtain a new unbiased method for sensitivityestimation which we call the ‘exact’ IPA or eIPA that is similar to the PPA method in [26]. Two mainreasons for the high variance of the GT method that have been identiﬁed in the existing literature are: 1)low magnitude of the sensitivity parameter θ (see [26, 25]) and 2) large system-size or volume under theclassical volume scaling of the reaction network [45]. The second issue is somewhat resolved by the centeredGirsanov Transformation (CGT) method [45, 52] and our numerical results indicate that the volume scalingbehavior of eIPA is similar to CGT (see Section 4.1). However eIPA does not suﬀer from high variance whenthe sensitivity parameter θ is small. In addition, when θ = 0, GT or CGT methods are not even applicablewhile eIPA does not suﬀer from this restriction. These observations make eIPA more appealing than CGTfor unbiased estimation of parameter sensitivity.For the sake of comparison, we use tau-leap versions of certain commonly used ﬁnite-diﬀerence estimators(see [2, 40, 51]) that approximate the inﬁnitesimal derivative in (1.1) by a ﬁnite-diﬀerence (see (2.11)). Suchestimators are computationally faster than τ IPA (in simulation time per trajectory) but they suﬀer fromtwo sources of bias (ﬁnite-diﬀerencing and tau-leap approximations) unlike τ IPA which only incurs bias fromthe latter source. We note that while in some examples the biases nearly cancel each other fortuitously, as2 general principle one has no logical reason to expect such cancellation.This paper is organized as follows. In Section 2 we describe the stochastic model for reaction dynamicsand the sensitivity estimation problem. We also discuss the existing sensitivity estimation methods, thetau-leap simulation schemes and explain the rationale for using such simulations in sensitivity estimation.Section 3 contains the main results of this paper which include a novel integral representation of the exactsensitivity in Section 3.2, a result on error bounds for the sensitivity estimates of τ IPA in Section 3.3 andthe novel tau-leap sensitivity estimation method τ IPA in Section 3.4. In Section 4 we provide computationalexamples to compare our method with other methods and ﬁnally in Section 5 we conclude and providedirections for future research.

Consider a reaction network with d species and K reactions. We describe its kinetics by a continuous timeMarkov process whose state at any time is a vector in the non-negative integer orthant N d comprising of themolecular counts of all the d species. The state evolves due to transitions caused by the ﬁring of reactions.We suppose that when the state is x , the rate of ﬁring of the k -th reaction is given by the propensity function λ k ( x ) and the corresponding state-displacement is denoted by the stoichiometric vector ζ k ∈ Z d . There areseveral ways to represent the Markov process ( X ( t )) t ≥ that describes the reaction kinetics under theseassumptions. We can specify the generator (see Chapter 4 in [14]) of this process by the operator A h ( x ) = K (cid:88) k =1 λ k ( x ) ( h ( x + ζ k ) − h ( x )) , (2.2)where h is any bounded real-valued function on N d . Alternatively we can express the Markov process directlyby its random time-change representation (see Chapter 7 in [14]) X ( t ) = X (0) + K (cid:88) k =1 Y k (cid:18)(cid:90) t λ k ( X ( s )) ds (cid:19) ζ k , (2.3)where { Y k : k = 1 , . . . , K } is a family of independent unit rate Poisson processes. Since the process ( X ( t )) t ≥ is Markovian, it can be equivalently speciﬁed by writing the Kolmogorov forward equation for the evolutionof its probability distribution p t ( x ) := P ( X ( t ) = x ) at each state x : dp t ( x ) dt = K (cid:88) k =1 p t ( x − ζ k ) λ k ( y − ζ k ) − p t ( x ) K (cid:88) k =1 λ k ( x ) . (2.4)This set of coupled ordinary diﬀerential equations (ODEs) is termed as the Chemical Master Equation (CME)in the biological literature [3]. As the number of ODEs in this set is typically inﬁnite, the CME is nearlyimpossible to solve directly, except in very restrictive cases. A common strategy is to estimate its solutionwith pathwise simulations of the process ( X ( t )) t ≥ using Monte Carlo schemes such as Gillespie’s SSA [19],the next reaction method [18], the modiﬁed next reaction method [4], and so on. While these schemes areeasy to implement, they become computationally infeasible for even moderately large networks, because theyaccount for each and every reaction event. To resolve this issue, tau-leaping methods have been developedwhich will be described in greater detail in Section 3.1.We now assume that each propensity function λ k depends on a real-valued system parameter θ . Toemphasize this dependence we write the rate of ﬁring of the k -th reaction at state x as λ k ( x, θ ) insteadof λ k ( x ). Let ( X θ ( t )) t ≥ be the Markov process representing the reaction dynamics with these parameter-dependent propensity functions. As stated in the introduction, for a function f : N d → R and an observationtime T ≥

0, our goal is to determine the sensitivity value S θ ( f, T ) deﬁned by (1.1). This value cannot becomputed directly for most examples of interest and so we need to ﬁnd ways of estimating it using simulationsof the process ( X θ ( t )) t ≥ . Such simulation-based sensitivity estimation methods work by specifying theconstruction of a random variable s θ ( f, T ) whose expected value is “close” to the true sensitivity value S θ ( f, T ), i.e. S θ ( f, T ) ≈ E ( s θ ( f, T )) . (2.5)3nce such a construction is available, a large number (say N ) of independent realizations s , . . . , s N of thisrandom variable s θ ( f, T ) are obtained and the sensitivity is estimated by computing their empirical mean (cid:98) µ N as (cid:98) µ N = 1 N N (cid:88) i =1 s i . (2.6)This estimator (cid:98) µ N is a random variable with mean and variance µ = E ( (cid:98) µ N ) = E ( s θ ( f, T )) and σ N = Var( (cid:98) µ N ) = σ N (2.7)respectively, where σ = Var( s θ ( f, T )). For a large sample size N , the distribution of (cid:98) µ N is approximatelyGaussian with mean µ and variance σ N , due to the Central Limit Theorem. The standard deviation σ N measures the statistical spread of the estimator (cid:98) µ N , that is inversely proportional to its statistical precision .The sample size N must be large enough to ensure that σ N is small relative to µ , i.e. for some small parameter (cid:15) >

0, we should have RSD √ N ≤ (cid:15), (2.8)where RSD := σ/ | µ | is the relative standard deviation of the random variable s θ ( f, T ). If such a conditionholds, then (cid:98) µ N is a reliable estimator for the true sensitivity value S θ ( f, T ) because it is very likely to assumea value close to its mean µ = E ( s θ ( f, T )) which in turn is close to S θ ( f, T ) (see (2.5)). In practice both µ and σ are unknown, but we can estimate them as µ ≈ (cid:98) µ N and σ ≈ √ N (cid:98) σ N where (cid:98) σ N = 1 (cid:112) N ( N − (cid:118)(cid:117)(cid:117)(cid:116) N (cid:88) i =1 ( s i − (cid:98) µ N ) . (2.9)is the estimated standard deviation σ N of the estimator.The performance of any sensitivity estimation method (say X ) depends on the following three key metricsthat are based on the properties of random variable s θ ( f, T ):1. The bias B ( X ) = E ( s θ ( f, T )) − S θ ( f, T ), which is the error incurred by the approximation (2.5).2. The variance V ( X ) = Var( s θ ( f, T )) of random variable s θ ( f, T ).3. The computational cost C ( X ) of generating one sample of s θ ( f, T ).The bias B ( X ) can be positive or negative, and its absolute value |B ( X ) | can be seen as the upper-boundon the statistical accuracy that can be achieved with method X by increasing the sample size N [9]. Asmentioned before, the standard deviation σ ( X ) = (cid:112) V ( X ) measures the statistical precision of the method X and its magnitude relative to the mean µ ( X ) = E ( s θ ( f, T )) determines the number of samples N thatare needed to produce a reliable estimate. In particular, to satisfy condition (2.8) for the relative standarddeviation RSD( X ) = σ ( X ) / | µ ( X ) | , the number of samples N (cid:15) needed would be around N (cid:15) := (RSD( X )) (cid:15) − .Hence the total cost of the estimation procedure is N (cid:15) C ( X ) ≈ (RSD( X )) C ( X ) (cid:15) − = V ( X )( µ ( X )) C ( X ) (cid:15) − , (2.10)where C ( X ) is the CPU time required for constructing one realization of s θ ( f, T ). The goal of a goodestimation method is to simultaneously minimize the three quantities |B ( X ) | , V ( X ) and C ( X ). This createsvarious conﬂicts and trade-oﬀs among the existing sensitivity estimation methods as we now discuss.4 .1 Biased methods A sensitivity estimation method X is called biased if B ( X ) (cid:54) = 0. The most commonly used biased methodsare the ﬁnite-diﬀerence schemes which approximate the inﬁnitesimal derivative in the deﬁnition of parametersensitivity (see (1.1)) by a ﬁnite-diﬀerence of the form S θ,h ( f, T ) = E ( f ( X θ + h ( T )) − f ( X θ ( T ))) h , (2.11)for a small perturbation h . The processes X θ and X θ + h represent the Markovian reaction dynamics withvalues of the sensitive parameter set to θ and θ + h respectively. These two processes can be simulatedindependently [23] but it is generally better to couple them in order to reduce the variance of the associatedestimator. The two commonly used coupling strategies are called Common Reaction Paths (CRP) [40] and

Coupled Finite Diﬀerences (CFD) [2] and they are based on the random time-change representation (2.3).The ﬁnite-diﬀerence approximation (2.11) for the true sensitivity value can be expressed as the expecta-tion E ( s θ,h ( f, T )) of the following random variable s θ,h ( f, T ) = f ( X θ + h ( T )) − f ( X θ ( T )) h . The three metrics (bias, variance and computational cost) based on this random variable deﬁne the perfor-mance of CRP and CFD. Since both these methods estimate the same quantity S θ,h ( f, T ), they have thesame bias (i.e. B (CRP) = B (CFD)). However in many cases it is found that the CFD coupling is tighter than the CRP coupling, resulting in a lower variance of s θ,h ( f, T ) (i.e. V (CFD) < V (CRP)) (see [2]). Foreach realization of s θ,h ( f, T ), both CRP and CFD require simulation of a coupled trajectory ( X θ , X θ + h ) inthe time interval [0 , T ]. The computational costs of such a simulation is roughly 2 C , where C is the cost of exactly simulating the process X θ using Gillespie’s SSA [19] or a similar method. Finite-diﬀerence schemes introduce a bias in the estimate whose size is proportional to the perturbationvalue h (i.e. B (CRP) = B (CFD) ∝ h ), but the constant of proportionality can be quite large in manycases, leading to signiﬁcant errors even for small values of h [26]. Unfortunately we cannot circumvent thisproblem by choosing a very small h because the variance is proportional to 1 /h (i.e. V (CRP) , V (CFD) ∝ /h ).Therefore if a very small h is selected, the variance will be enormous and the sample-size required to producea statistically precise estimate will be very large, imposing a heavy computational burden on the estimationprocedure [26]. This trade-oﬀ between bias and variance is the main drawback of ﬁnite-diﬀerence schemesand there does not exist a strategy for selecting h that optimally balances these two quantities. Note thatunlike bias and variance, the computational cost of generating a sample (i.e. C (CRP) or C (CFD)) does notchange signiﬁcantly with h , thereby ensuring that regardless of h , the total computational burden varieslinearly with the required number of samples N . Apart from ﬁnite-diﬀerence schemes, there exists anotherbiased method, called the regularized pathwise-derivative method [41] for estimating the sensitivity value(1.1), but we do not discuss this approach in this paper. A sensitivity estimation method X is called unbiased if B ( X ) = 0. The main advantage of unbiased methodsis that the estimation can in principle be made as accurate as possible by increasing the sample size N . Theﬁrst unbiased method for sensitivity estimation is called the Girsanov Transformation (GT) method [22, 33],which works by estimating the θ -derivative of the probability distribution of X θ . The GT method is easy toimplement and the computation cost of generating each sample is roughly C – the cost of exact simulationof the process X θ . The main issue with the GT method is that generally the variance of its associatedrandom variable s θ ( f, T ) is very large and so the number of samples needed to obtain a statistically preciseestimate is very high [2, 40]. So far two reasons have been identiﬁed for this behavior. Firstly, it hasbeen shown that for mass-action models (see [3]) this variance can become unbounded when the magnitudeof the sensitive reaction rate-constant θ approaches zero [26, 25]. This is a serious issue because biological In fact the cost of generating a realization of s θ ( f, T ) is usually smaller for CFD in comparison to CRP (i.e. C (CFD) < C (CRP)), because the CFD coupling is such that if X θ ( t ) = X θ + h ( t ) for some t < T , then this equality will hold for theremaining time-interval [ t, T ], allowing us to directly set s θ,h ( f, T ) = 0 without completing the simulation in the interval [ t, T ]. slow reactions which are characterized by low values of the associated rate-constants.Furthermore the GT method does not allow one to estimate the sensitivity w.r.t. a rate-constant set to zero.Such sensitivity values are useful for understanding network design as it allows one to probe the eﬀect ofpresence or absence of reactions. Another reason for the high variance of GT estimator was provided in [45]where it was theoretically established that this variance can grow boundlessly as the system expands in size,i.e. the system volume V tends to inﬁnity. This issue is somewhat ameliorated by the centered GirsanovTransformation (CGT) method [52] but the problem with small reaction rate-constants persists.We now discuss a couple of unbiased methods that have been recently proposed. These methods arecalled the Auxiliary Path Algorithm (APA )[25] and the

Poisson Path Algorithm (PPA) [26], and they arebased on exact representations of the form (2.5) for the parameter sensitivity (1.1). For both the methods,sampling the random variable s θ ( f, T ) requires simulation of a ﬁxed number M of additional paths of theprocess X θ . It was shown in [25] that in comparison to the GT method, the computational cost of generatingeach sample for APA is much higher (i.e. C (APA) (cid:29) C (GT)) but this is often compensated by the fact thatits variance is much lower (i.e. V (APA) (cid:28) V (GT)), resulting in a smaller overall cost of estimation (2.10).The reason for the higher sampling cost for APA is that it needs estimates of certain unknown quantitiesat each jump-time of the process X θ in the time interval [0 , T ], which can be very large in number even forsmall networks. In PPA, this problem is resolved by randomly selecting a small number of these unknownquantities for estimation in such a way that the estimator remains unbiased. Due to this extra randomness,the sample variance for PPA is generally greater than APA (i.e. V (PPA) > V (APA)) but the computationalcost for realizing each sample is much lower (i.e. C (PPA) (cid:28) C (APA)). Moreover in comparison to APA, PPAis far easier to implement and has lower memory requirements, making it an attractive unbiased method forsensitivity estimation. In [26] it is shown using many examples that for a given level of statistical accuracy,PPA can be more eﬃcient than GT and also the ﬁnite-diﬀerence schemes CFD and CRP. The computationalcost of generating each sample in PPA is roughly (2 M +1) C , where M is a small number that upper-boundsthe expected number of unknown quantities that will be estimated using additional paths. For both APAand PPA, the parameter M serves as a trade-oﬀ factor between the computational cost and the variance- as M increases, the cost also increases but the variance decreases. However both these methods remainunbiased for any choice of M .The foregoing trade-oﬀ relationships for the existing sensitivity estimation methods are summarized inTable 1. Type Method Trade-oﬀ Trade-oﬀ Preserved X quantities parameter quantityBiased CRP B ( X ) & V ( X ) h C ( X ) ≈ C CFDUnbiased APA V ( X ) & C ( X ) M B ( X ) = 0PPATable 1: Trade-oﬀ relationships among the bias B ( X ), variance V ( X ) and the computational cost C ( X ) forexisting sensitivity estimation methods. Here h is the perturbation size for ﬁnite-diﬀerence schemes [2, 40]and M quantiﬁes the number of auxiliary paths for APA [25] and PPA [26]. The cost of exactly simulatingthe underlying process is C . All the existing sensitivity estimation methods suﬀer from a critical bottleneck – they are all based onexact simulations of the process X θ . The computational cost C of generating each trajectory of X θ canbe exorbitant even for moderately large networks when those networks have some molecular species inmoderately large copy numbers and/or reactions ﬁring at multiple timescales ( stiﬀ systems ). One wayto counter this problem is to develop methods that can accurately estimate parameter sensitivities with approximate computationally inexpensive simulations of the process X θ obtained with tau-leap methods.The use of tau-leap simulations provides a natural way to trade-oﬀ a small amount of error with a potentially large reduction in the computational costs. 6he explicit tau-leap method with Poisson random numbers proposed by Gillespie [20] generally workswell in non-stiﬀ situations and when molecular copy numbers are modestly large. The major drawbackis that it becomes ineﬃcient for stiﬀ systems where vastly diﬀerent time scales are present. The implicit tau-leap was proposed to remedy this weakness [36]. Many other tau-leap methods and step size selectionstrategies have been proposed to address stiﬀness and other issues [11, 43, 5, 39, 48, 47, 32].In the context of stiﬀ systems, tau-leap methods have not been as successful in maintaining accuracywhile reducing computational cost in comparison with the success of stiﬀ solvers for deterministic diﬀerentialequations. This is because stiﬀness manifests in a more complex manner in stochastic systems where stabilityis not the only issue, but accurately capturing the asymptotic distribution of the fast variables is alsoimportant [36, 37, 49, 50]. We shall limit our attention to non-stiﬀ or modestly stiﬀ systems in this paper.Our goal in this paper is to develop a method that can estimate parameter sensitivity S θ ( f, T ) of the form(1.1) using only tau-leap simulations of the process X θ . This can be done by specifying a random variable s ( τ ) θ ( f, T ) which can be constructed with these tau-leap simulations and whose expected value is “close” tothe true sensitivity value S θ ( f, T ), i.e. S θ ( f, T ) ≈ E ( s ( τ ) θ ( f, T )) . (2.12)We propose such a random variable s ( τ ) θ ( f, T ) in this paper and provide a simple algorithm for generatingthe realizations of s ( τ ) θ ( f, T ). We theoretically show that under certain reasonable conditions, the associatedestimator is tau-convergent , which means that the bias incurred due to the approximation in (2.12) convergesto 0, as the maximum step-size τ max or the coarseness of the time-discretization mesh goes to 0. Hence bymaking this mesh ﬁner and ﬁner, we can make the estimator as accurate as we desire, provided that we arewilling to bear the increasing computational costs. In the context of estimating expected values E ( f ( X θ ( T ))),the property of tau-convergence along with the rate of convergence, has already been established for manytau-leap schemes [38, 31, 6, 35]. We use these pre-existing results and obtain a similar tau-convergence resultfor our sensitivity estimation method. An important feature of our approach is that it is completely ﬂexible,as far as the choice of the tau-leap simulation method is concerned. Furthermore the order of accuracy ofour sensitivity estimation method is the same as the order of accuracy of the underlying tau-leap method.We end this section with observing that incorporating tau-leap schemes in sensitivity estimation opens upa new dimension in attacking this challenging problem. In the trade-oﬀ relationships for existing sensitivityestimation methods (see Table 1) parameters like h and M only allow us to explore one trade-oﬀ curvebetween the variance V ( X ) and some other metric like the bias B ( X ) (for X = CRP, CFD) or the compu-tational cost C ( X ) (for X = APA, PPA). The main advantage of employing tau-leap schemes is that theyprovide a mechanism for exploring another trade-oﬀ curve between the bias B ( X ) and the computationalcost C ( X ), for the purpose of optimizing the performance of a sensitivity estimation method. In Section4, we provide numerical examples to show that with tau-leap simulations we can indeed trade-oﬀ a smallamount of bias with large savings in the computational eﬀort required for estimating parameter sensitivity.Moreover this trade-oﬀ relationship appears to be independent of existing trade-oﬀ relationships mentionedin Table 1 because replacing exact simulations in a sensitivity estimation method, with approximate tau-leap simulations, usually does not alter the variance V ( X ) signiﬁcantly at least when the tau step size issuﬃciently small (see Section 4). Of course, the computational advantage of tau-leap schemes can only beappropriated if we can incorporate them into existing sensitivity estimation methods. The main contributionof this paper is to develop a method, similar to PPA, that works well with tau-leap schemes (see Section3). For the sake of comparison, we also provide tau-leap versions of the ﬁnite-diﬀerence schemes (CRP andCFD) in Section 4. In this section we present our approach for accurately estimating parameter sensitivities of the form (1.1)with only approximate tau-leap simulations of the dynamics. This approach is based on an exact integralrepresentation for parameter sensitivity given in Section 3.2. With this representation at hand, we construct atau-leap estimator for parameter sensitivity and examine its convergence properties as the time-discretizationmesh gets ﬁner and ﬁner (see Sections 3.3 and 3.4). Thereafter in Section 3.5 we present an algorithm that7omputes the tau-leap estimator for sensitivity estimation. We start with the description of a generic tau-leap method that approximately simulates the stochastic reaction paths deﬁned by the Markov process( X ( x , t )) t ≥ with generator A (see (2.2)) and initial state x . For each reaction k = 1 , . . . , K , let R k ( t ) be the number of ﬁrings of reaction k until time t . Due to (2.3) wecan express each R k ( t ) as R k ( t ) = Y k (cid:18)(cid:90) t λ k ( X ( x , s )) ds (cid:19) ζ k , where { Y k : k = 1 , . . . , K } is a family of independent unit rate Poisson processes. From now on we referto R ( t ) = ( R ( t ) , . . . , R K ( t )) as the reaction count vector. For any two time values s, t ≥ s < t ),the states at these times satisfy X ( x , t ) = X ( x , s ) + (cid:80) Kk =1 ( R k ( t ) − R k ( s )) ζ k . At any given time t and thecomputed (approximate) state x at time t , a tau-leap method entails taking either a predetermined stepof size τ > τ as a function of the current state and time, i.e. step-size selectionis adapted to the information sigma-algebra generated by the tau-leap process. Next an approximatingdistribution for the state at time ( t + τ ) is generated. This distribution is generally found by approximatingthe diﬀerence ( R ( t + τ ) − R ( t )) in the reaction count vector by a random variable (cid:101) R = ( (cid:101) R , . . . , (cid:101) R K ) whoseprobability distribution is easy to sample from. The most straightforward choice is given by the simple(explicit) Euler method [20], which assumes that the propensities are approximately constant in the timeinterval [ t, t + τ ) and conditioned on the information at time t , each (cid:101) R k is an independent Poisson randomvariable with rate λ k ( x ) τ . Other distributions for (cid:101) R = ( (cid:101) R , . . . , (cid:101) R K ) have also been used in the literatureto obtain better approximations and particularly to prevent the state-components from becoming negative[43]. The selection method for step size τ also varies, with the simplest being steps based on a deterministicmesh 0 = t < t · · · < t n = T over the observation time interval [0 , T ]. To obtain better accuracy severalstrategies have been proposed that randomly select τ based on some criteria such as avoidance of negativestate-components or constancy of conditional propensities [11, 5, 32].To represent a generic tau-leap method we shall use a pair of abstract labels α and β , where α denotesa method, i.e. a choice of distribution for (cid:101) R , and β denotes a step size selection strategy. We will use | β | as a (deterministic) parameter which quantiﬁes the coarseness of the time-discretization scheme β . Forinstance α may stand for the explicit Euler tau-leap method [20] and β may stand for a deterministic mesh0 = t < t < · · · < t n = T , and in this case the coarseness parameter is | β | = max( t j − t j − ). Typically,tau-leap methods produce approximations of the underlying process at certain leap times that are separatedby the step-size τ and one can interpolate these approximate state values at other time points. The mostobvious interpolation is the “sample and hold” method, where the tau-leap process is held constant betweenthe consecutive leap times. In circumstances, such as the explicit Euler tau-leap method with Poissonupdates, it is more natural to use interpolation strategies based on the random time-change representation(2.3) – for example see the “Poisson bridge” approach in [29]. In the following discussion, we suppose thatthe interpolation strategy is also determined by the label α . We shall use ( Z α,β ( x , t )) t ≥ to denote thetau-leap process, that approximates the exact dynamics ( X ( x , t )) t ≥ , and that results from the applicationof a tau-leap method α with step size selection strategy β . This process is deﬁned by the prescription Z α,β ( x , t ) = x and Z α,β ( x , t i +1 ) = Z α,β ( x , t i ) + K (cid:88) k =1 ζ k (cid:101) R k,i,α,β for i = 1 , . . . , µ, (3.13)where µ is the (possibly) random number of time points, 0 = t < t < · · · < t µ = T are the (possibly)random leap times, and (cid:101) R k,i,α,β for i = 1 , . . . , µ and k = 1 , . . . , K are random variables whose distributionwhen conditioned on Z α,β ( x, t i ) is determined by the method α and step size strategy β . Remark 3.1

Note that this generic tau-leap method reduces to Gillespie’s SSA [19], if at state Z α,β ( x , t i ) = z , the next step size τ is an exponentially distributed random variable with rate λ ( z ) := (cid:80) Kk =1 λ k ( z ) and ach (cid:101) R k,i,α,β is chosen as if k = η and otherwise, where η is a discrete random variable which assumesthe value i ∈ { , . . . , K } with probability ( λ i ( z ) /λ ( z )) . Later we shall establish tau-convergence of our sensitivity estimator by showing that for a ﬁxed tau-leapmethod α , the bias incurred by our estimator converges to 0 as the coarseness | β | of the time-discretizationscheme goes to 0. For this we shall require (weak) convergence of all moments of the tau-leap process tothose of the exact process. We now state this requirement more precisely and present a simple lemma thatwill be needed later. For p ≥

0, we say that a function f : N d → R is of class C p if there exists a positiveconstant C such that | f ( x ) | ≤ C (1 + (cid:107) x (cid:107) p ) for all x ∈ N d . (3.14)We shall require that a tau-leap method α satisﬁes an order γ > Assumption 1

Given a tau-leap method α , there exist γ > δ > ξ : R + → R + suchthat, for every p ≥ T >

0, there exists a constant C ( p, T, α ) satisfyingsup t ∈ [0 ,T ] | E ( (cid:107) Z α,β ( x , t ) (cid:107) p ) − E ( (cid:107) X ( x , t ) (cid:107) p ) |≤ sup t ∈ [0 ,T ] (cid:88) y ∈ N d (1 + (cid:107) y (cid:107) p ) | P ( Z α,β ( x , t ) = y ) − P ( X ( x , t ) = y ) |≤ C ( p, T, α )(1 + (cid:107) x (cid:107) ξ ( p ) ) | β | γ , for any initial state x provided that | β | ≤ δ . Note that here the second inequality is our assumption whilethe ﬁrst inequality always holds. In above, we have assumed that there is a common probability space (Ω , P )carrying the exact process X and the tau-leap process Z α,β . Remark 3.2

We observe that Assumption 1 essentially assumes order O ( | β | γ ) convergence in the so-called p -th moment variation norm (see [35]) of the probability law of Z α,β ( x , t ) (on Z d ) to the probability law of X ( x , t ) (on Z d ) and it is not as restrictive as it might seem at ﬁrst glance. The p -th moment variationnorm of a signed ﬁnite measure µ on Z d which possesses a ﬁnite p -th moment is deﬁned by (cid:107) µ (cid:107) p = (cid:88) x ∈ Z

12 (1 + (cid:107) x (cid:107) p ) | µ ( x ) | , and the space M p deﬁned by M p = { µ : Z → R | (cid:107) µ (cid:107) p < ∞} , is isometrically isormorphic to (cid:96) , the space of absolutely summable sequences and moreover, C p is the dualspace of M p (see [35]). We note that by the Schur property , weak convergence implies norm convergencein (cid:96) . In Assumption 1, if we merely assumed weak convergence of order O ( | β | γ ) in M p , due to the Schurproperty, we obtain convergence in p -th moment variation norm of order O ( | β | γ (cid:48) ) for any γ (cid:48) ∈ (0 , γ ) . More-over, we note that convergence of tau-leap methods in the moment variation norms have been derived in [35]and apply to a large class of situations including (but not limited to) systems that remain in a bounded subsetof the integer state space. We also remark that to our best knowledge, all convergence results on tau-leapinghave been limited to considering determinstic time steps. However, in the applied literature, adaptive timestep selection methods have been explored numerically, and it is reasonable to expect convergence results tobe established in the future for a reasonable class of adaptive step size selection schemes. In this paper, ournumerical simulations are restricted to deterministic time steps. Additionally we will require Assumptions 2 and 3 on moment growth bounds of the exact process as wellas the tau-leap process. These assumptions can be veriﬁed using the results in [34, 24, 35].

Assumptions 2 and 3

Given a tau-leap method α , there exists δ > T > p ≥ C ( p, T ) and C ( p, T, α ) satisfyingsup t ∈ [0 ,T ] (1 + E ( (cid:107) X ( x , t ) (cid:107) p )) ≤ C ( p, T )(1 + (cid:107) x (cid:107) p )and sup t ∈ [0 ,T ] (1 + E ( (cid:107) Z α,β ( x , t ) (cid:107) p )) ≤ C ( p, T, α )(1 + (cid:107) x (cid:107) p ) , (3.15)9or all t ∈ [0 , T ], provided | β | ≤ δ .We emphasize that constants C and C in Assumptions 1 and 3, do not depend on the step-size selectionstrategy β , and all the three constants in these assumptions may be assumed to be monotonic in T withoutany loss of generality. The following lemma follows readily from the above assumptions. Lemma 3.3

Consider a function φ : N d × [0 , T ] → R and suppose that there exists a constant C > suchthat sup t ∈ [0 ,T ] | φ ( x, t ) | ≤ C (1 + (cid:107) x (cid:107) p ) for all x ∈ N d . Then under Assumptions 1,2 and 3, we have sup t ∈ [0 ,T ] | E ( φ ( Z α,β ( x , t ) , t )) − E ( φ ( X ( x , t ) , t )) | ≤ CC ( p, T, α )(1 + (cid:107) x (cid:107) ξ ( p ) ) | β | γ , sup t ∈ [0 ,T ] | E ( φ ( X ( x , t ) , t )) | ≤ CC ( p, T )(1 + (cid:107) x (cid:107) p )and sup t ∈ [0 ,T ] | E ( φ ( Z α,β ( x , t ) , t )) | ≤ CC ( p, T, α )(1 + (cid:107) x (cid:107) p ) , (3.16) provided | β | ≤ δ . Let ( X θ ( t )) t ≥ be the Markov process representing reaction dynamics with initial state x and let Ψ θ ( x, f, t )be deﬁned by Ψ θ ( x, f, t ) = E ( f ( X θ ( t )) | X θ (0) = x ) , (3.17)for any state x ∈ N d and time t ≥

0. For any k = 1 , . . . , K and any function h : N d → R , let ∆ ζ k denote thediﬀerence operator given by ∆ ζ k h ( x ) = h ( x + ζ k ) − h ( x ) . The following theorem expresses the sensitivity value S θ ( f, T ) as the expectation of a random variable whichcan be computed from the paths of the process ( X θ ( t )) t ≥ in the time interval [0 , T ]. The proof of thistheorem is provided in the Appendix 5.1. Theorem 3.4

Suppose ( X θ ( t )) t ≥ is the Markov process with generator A θ and initial state x . Then thesensitivity value S θ ( f, T ) is given by S θ ( f, T ) = ∂∂θ Ψ θ ( x , f, T ) = K (cid:88) k =1 E (cid:32)(cid:90) T ∂λ k ( X θ ( t ) , θ ) ∂θ ∆ ζ k Ψ θ ( X θ ( t ) , f, T − t ) dt (cid:33) . Remark 3.5

This formula has the following simple interpretation. Due to an inﬁnitesimal perturbation ofparameter θ , the probability that the process ( X θ ( t )) t ≥ has an “extra” jump at time t in the direction ζ k isproportional to ∂λ k ( X θ ( t ) , θ ) ∂θ . Moreover the change in the expectation of f ( X θ ( T )) at time T due to this “extra” jump at time t is just ∆ ζ k Ψ θ ( X θ ( t ) + ζ k , f, T − t ) . The above result shows that the overall sensitivity of the expectation of f ( X θ ( x, T )) is just the product ofthese two terms, integrated over the whole time interval [0 , T ] . The rest of this section is devoted to the development of a tau-leap estimator for parameter sensitivityusing this formula. To simplify our notations, we suppress the dependence on parameter θ , and hence denote λ k ( · , θ ) by λ k ( · ), ∂λ k /∂θ by ∂λ k , S θ ( f, T ) by S ( f, T ), Ψ θ ( x, f, t ) by Ψ( x, f, t ) and the process ( X θ ( t )) t ≥ by( X ( t )) t ≥ . Due to Theorem 3.4 the sensitivity value S ( f, T ) can be expressed as S ( f, T ) = K (cid:88) k =1 E (cid:32)(cid:90) T ∂λ k ( X ( t ))∆ ζ k Ψ( X ( t ) , f, T − t ) dt (cid:33) . (3.18)10 .3 Sensitivity approximation with tau-leap simulations In order to construct a tau-leap estimator for parameter sensitivity using formula (3.18), we need to replaceboth ∂λ k ( X ( t )) and ∆ ζ k Ψ( X ( t ) , f, T − t ) with approximations derived with tau-leap simulations. Recallfrom Section 3.1 that a generic tau-leap scheme can be described by a pair of abstract labels α and β ,specifying the method and the step-size selection strategy respectively. Assuming such a tau-leap scheme ischosen, let the corresponding tau-leap process ( Z α,β ( x, t )) t ≥ (see (3.13)) be an approximation for the exactdynamics starting at state x .Suppose that we use the tau-leap method α with the step-size selection strategy β to approximate X ( t ) and possibly a diﬀerent tau-leap method α with a time-dependent step-size selection strategy β ( t )to compute an approximation of ∆ ζ k Ψ( X ( t ) , f, T − t ). This time-dependence in step-size selection is neededbecause the latter quantity requires simulation of auxiliary tau-leap paths in the interval [0 , T − t ] whichvaries with t . We discuss this in greater detail in the next section. In the following discussion, we willassume that both the tau-leap schemes ( α , β ) and ( α , β ( t )) satisfy Assumptions 1,2 and 3, with common γ > , δ > | β | replaced by the supremum step-size τ max = sup t ∈ [0 ,T ] {| β | , | β ( t ) |} (3.19)which is less than δ . We deﬁne the tau-leap approximation of Ψ( x, f, t ) (see (3.17)) by (cid:101) Ψ α,β ( x, f, t ) = E ( f ( Z α,β ( x, t ))) , (3.20)and make the assumption that the step size selection strategy β ( t ) depends on t in such a way that t (cid:55)→ (cid:101) Ψ α ,β ( t ) ( x, f, T − t ) is a measurable function of t . Motivated by formula (3.18), we shall approximate thetrue sensitivity value S ( f, t ) by (cid:101) S ( f, T ) = K (cid:88) k =1 E (cid:32)(cid:90) T ∂λ k ( Z α ,β ( x , t ))∆ ζ k (cid:101) Ψ α ,β ( t ) ( Z α ,β ( x , t ) , f, T − t ) dt (cid:33) , (3.21)where x is the starting state of the process ( X ( t )) t ≥ . The next theorem, proved in the Appendix 5.1,shows that the bias of this sensitivity approximation is similar to the bias of the underlying tau-leap scheme.In particular if the tau-leap method satisﬁes order γ convergent error bound, then the same is true for theerror incurred by the sensitivity approximation. Before we state the theorem, recall that for any p ≥

0, afunction f : N d → R is in class C p if it satisﬁes (3.14) for some constant C ≥ Theorem 3.6

Let f : N d → R as well as ∂λ k for each k = 1 , . . . , K be of class C p for some p ≥ . Supposethat a tau-leap approximation (cid:101) S ( f, T ) of the exact sensitivity S ( f, T ) is computed by (3.21) , where a tau-leapmethod α with step size strategy β is used to approximate the underlying process ( X ( t )) t ≥ and possibly adiﬀerent tau-leap method α with time-dependent step size strategy β ( t ) is used to compute approximations (cid:101) Ψ α ,β ( t ) ( x, f, T − t ) of Ψ( x, f, T − t ) at each t ∈ [0 , T ] . If both the tau-leap methods satisfy Assumptions 1,2and 3, with common γ > and δ > , then there exists a constant (cid:101) C ( f, T ) such that | (cid:101) S ( f, T ) − S ( f, T ) | ≤ (cid:101) C ( f, T ) τ γ max , where τ max is given by (3.19) and it is less than δ . We remark that there are two forms of error analyses in the literature for tau-leap methods. The ﬁrst typeis more conventional where the analysis is carried out for a given system in an interval [0 , T ] as τ max →

0. See[38, 31, 35]. An alternative analysis considers a family of systems parametrized by “system size” V , wherestep size τ is chosen in relation to V as τ = V − β (where β > V → ∞ [6]. Aspointed out in [35] both analyses are useful. The ﬁrst type of analysis with ﬁxed system size is importantin that if convergence or more importantly zero-stability (see [35]) does not hold in this conventional sense,then the computed solution can be very erroneous not only when the step size τ is too large, but also whenit is too small! On the other hand, the system size scaling analysis helps explains why tau-leap remainseﬃcient while leaping over several reaction events. In the interest of space, we limit ourselves to the ﬁrsttype in this paper. 11 .4 A tau-leap estimator for parameter sensitivity We now come to the problem of estimating the sensitivity approximation (cid:101) S ( f, T ) using tau-leap simulations.Expression (3.21) shows that (cid:101) S ( f, T ) is the expectation of the random variable s ( f, T ) deﬁned by s ( f, T ) = K (cid:88) k =1 (cid:90) T ∂λ k ( Z α ,β ( x , t ))∆ ζ k (cid:101) Ψ α ,β ( t ) ( Z α ,β ( x , t ) , f, T − t ) dt. (3.22)If we can generate samples of this random variable, then the estimation of (cid:101) S ( f, T ) would be quite straight-forward using (2.6). However this is not the case as the random variable s ( f, T ) is nearly impossible togenerate. This is mainly because it requires computing quantities of the form∆ ζ k (cid:101) Ψ α ,β ( t ) ( Z α ,β ( x , t ) , f, T − t ) (3.23)at inﬁnitely many time points t . These quantities generally do not have an explicit formula and hencethey need to be estimated via auxiliary Monte Carlo simulations, which severely restricts the number ofsuch quantities that can be feasibly estimated. We tackle these problems by constructing another randomvariable (cid:101) s ( f, T ) whose expected value equals (cid:101) S ( f, T ), and whose samples can be easily generated using asimple procedure called τ IPA (Tau Integral Path Algorithm) that is described in Section 3.5. This randomvariable is constructed by adding randomness to the random variable s ( f, T ) in such a way that only a smallﬁnite number of unknown quantities of the form (3.23) require estimation. We now present this construction. Construction of the random variable (cid:101) s ( f, T ) : Recall from Section 3.1 the description of the tau-leapprocess ( Z α ,β ( x , t )) t ≥ which approximates the exact dyamics ( X ( t )) t ≥ . Let 0 = t < t < · · · < t µ = T be the (possibly random) mesh corresponding to step size selection strategy β . We denote the σ -algebragenerated by the process ( Z α ,β ( x , t )) t ≥ and the random mesh β over the interval [0 , T ] by F T . Let τ i = t i +1 − t i and let η i be the positive integer given by η i = max (cid:40)(cid:38) (cid:80) Kk =1 | ∂λ k ( Z α ,β ( x , t i )) | τ i C (cid:39) , (cid:41) , (3.24)where C is a positive constant and (cid:100) x (cid:101) denotes the smallest integer greater than or equal to x . The choiceof C and its role will be explained later in the section. Deﬁne σ ij := t i + u ij τ i for each j = 1 , . . . , η i , whereeach u ij is an independent random variable with distribution Uniform[0 , t i and t i +1 , thedistribution of each σ ij is Uniform[ t i , t i +1 ]. Moreover taking expectation over the distribution of u ij -s we get E  τ i η i η i (cid:88) j =1 ∂λ k ( Z α ,β ( x , σ ij )) ∆ ζ k (cid:101) Ψ α ,β ( σ ij ) ( Z α ,β ( x , σ ij ) , f, T − σ ij ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F T  = (cid:90) t i +1 t i ∂λ k ( Z α ,β ( x , t )) ∆ ζ k (cid:101) Ψ α ,β ( t ) ( Z α ,β ( x , t ) , f, T − t ) dt. In deriving the last equality we have used the substitution t = t i + uτ i . This relation along with (3.21) yields (cid:101) S ( f, T ) = K (cid:88) k =1 E (cid:32) µ − (cid:88) i =0 (cid:90) t i +1 t i ∂λ k ( Z α ,β ( x , t )) ∆ ζ k (cid:101) Ψ α ,β ( t ) ( Z α ,β ( x , t ) , f, T − t ) dt (cid:33) (3.25)= K (cid:88) k =1 E  µ − (cid:88) i =0 η i (cid:88) j =1 τ i η i ∂λ k ( Z α ,β ( x , σ ij )) ∆ ζ k (cid:101) Ψ α ,β ( σ ij ) ( Z α ,β ( x , σ ij ) , f, T − σ ij )  using linearity of the expectation operator. To obtain the states Z α ,β ( x , σ ij ) for all the σ ij -s, we need tointerpolate the tau-leap dynamics between the times t i and t i +1 .To proceed further we deﬁne a “conditional estimator” (cid:98) D kij of the quantity (3.23) at t = σ ij by (cid:98) D kij = f ( Z kijα ,β ( σ ij ) ( z + ζ k , T − σ ij )) − f ( Z kijα ,β ( σ ij ) ( z, T − σ ij )) (3.26)12here z = Z α ,β ( x , σ ij ), and Z kij and Z kij are instances of tau-leap approximations of the exact dynamicsstarting at initial states ( z + ζ k ) and z respectively. Both these tau-leap processes use the same method α and the same step-size selection strategy β ( σ ij ). Moreover conditioned on Z α ,β ( x , σ ij ) and σ ij , theprocesses Z kij , Z kij and the step-size selection strategy β ( σ ij ) are independent of the process Z α ,β andthe step-size selection strategy β . Therefore it is immediate that E ( (cid:98) D kij | Z α ,β ( x , σ ij ) , σ ij ) = ∆ ζ k (cid:101) Ψ α ,β ( σ ij ) ( Z α ,β ( x , σ ij ) , f, T − σ ij ) , (3.27)and hence from (3.25) we obtain the following representation for (cid:101) S ( f, T ) (cid:101) S ( f, T ) = K (cid:88) k =1 E  µ − (cid:88) i =0 η i (cid:88) j =1 τ i η i ∂λ k ( Z α ,β ( x , σ ij )) (cid:98) D kij  . (3.28)An estimator for (cid:101) S ( f, T ) based on this formula can require several computations of (cid:98) D kij . Since each evalua-tion of (cid:98) D kij is computationally expensive, we would like to control the total number of these evaluations byrandomizing the decision of whether (cid:98) D kij should be evaluated at time σ ij or not. Moreover this randomiza-tion must be performed without introducing a bias in the estimator. We now describe this process.Deﬁne R kij and P kij by R kij = ∂λ k ( Z α ,β ( x , σ ij )) τ i and P kij = (cid:18) | R kij | Cη i (cid:19) ∧ , (3.29)and let ρ kij be an independent { , } -valued random variable whose distribution is Bernoulli with parameter P kij . Since E ( ρ kij | Z α ,β ( x , σ ij ) , F T ) = P kij we have that (cid:101) S ( f, T ) = K (cid:88) k =1 E  µ − (cid:88) i =0 η i (cid:88) j =1 (cid:18) R kij P kij η i (cid:19) ρ kij (cid:98) D kij  , (3.30)where we deﬁne R kij /P kij to be 0 when R kij = 0. This formula suggests that (cid:101) S ( f, T ) can be estimated,without any bias, using realizations of the random variable (cid:101) s ( f, T ) = K (cid:88) k =1 µ − (cid:88) i =0 η i (cid:88) j =1 (cid:18) R kij P kij η i (cid:19) ρ kij (cid:98) D kij . (3.31)In generating each realization of (cid:101) s ( f, T ), the computation of (cid:98) D kij is only needed if the Bernoulli randomvariable ρ kij is 1. Therefore, if we can eﬀectively control the number of such ρ kij -s then we can eﬃcientlygenerate realizations of (cid:101) s ( f, T ). This can be achieved using the positive parameter C (see (3.24) and (3.29))as we soon explain. Based on the construction outlined above, we provide a method in Section 3.5 forobtaining realizations of the random variable (cid:101) s ( f, T ). We call this method, the Tau Integral Path Algorithm ( τ IPA), to emphasize the fact that (cid:101) s ( f, T ) is essentially an approximation of the integral (3.22). Using τ IPAwe can eﬃciently generate realizations s , s , . . . , s N of (cid:101) s ( f, T ) and approximately estimate the parametersensitivity (cid:101) S ( f, T ) with the estimator (2.6). Minimizing the variance of (cid:101) s ( f, T ) : To improve the eﬃciency of τ IPA, we must minimize the addi-tional variance due to the extra randomness that has been added to the random variable s ( f, T ) (3.22) toobtain (cid:101) s ( f, T ). Since E ( (cid:101) s ( f, T ) |F T ) = s ( f, T ), this additional variance is equal to Var( (cid:101) s ( f, T ) |F T ), and inorder to reduce this quantity we focus on reducing the conditional variance Var( (cid:98) D kij |F T ). Recall that (cid:98) D kij is given by (3.26) and for convenience we abbreviate Z lkijα ,β ( σ ij ) by Z l for l = 1 ,

2. The reduction in thisconditional variance can be accomplished by tightly coupling the pair of processes ( Z , Z ). For this purposewe use the split-coupling (see [2]) speciﬁed by Z ( t ) = ( Z α ,β ( x , σ ij ) + ζ k ) + K (cid:88) k =1 Y k (cid:18)(cid:90) t λ k ( Z ( α ( s )) , θ ) ∧ λ k ( Z ( α ( s )) , θ ) ds (cid:19) ζ k (3.32) K (cid:88) k =1 Y (1) k (cid:18)(cid:90) t (cid:0) λ k ( Z ( α (( s )) , θ ) − λ k ( Z ( α ( s )) , θ ) ∧ λ k ( Z ( α ( s )) , θ ) (cid:1) ds (cid:19) ζ k Z ( t ) = Z α ,β ( x , σ ij ) + K (cid:88) k =1 Y k (cid:18)(cid:90) t λ k ( Z ( α ( s )) , θ ) ∧ λ k ( Z ( α ( s )) , θ ) ds (cid:19) ζ k (3.33)+ K (cid:88) k =1 Y (2) k (cid:18)(cid:90) t (cid:0) λ k ( Z ( α (( s )) , θ ) − λ k ( Z ( α ( s )) , θ ) ∧ λ k ( Z ( α ( s )) , θ ) (cid:1) ds (cid:19) ζ k , where { Y k , Y (1) k , Y (2) k : k = 1 , . . . , K } is an independent family of unit rate Poisson processes. Here α ( s ) = t i for t i ≤ s < t i +1 , and { t , t , t , . . . } is the sequence of leap-times of the pair of processes ( Z , Z ) jointlysimulated with the tau-leap scheme ( α , β ( t )). Note that process α is adapted to the ﬁltration generated byprocesses ( Z , Z ). Hence a solution to (3.32)-(3.33) can be found by explicit construction. The uniquenessof the solution ( Z , Z ), until the ﬁrst time τ M its norm exceeds some constant M >

0, is guaranteed by thelocal boundedness of the associated generator (see Theorem 4.1 in Chapter 4 of [14]). Using Assumption 3one can show that as M → ∞ we have τ M → ∞ a.s. and from this, the uniqueness of the solution ( Z , Z )in the whole time-interval [0 , ∞ ) can be established. See Lemma A.1 in [27] for more details on this argument. Controlling the number of nonzero ρ kij -s: We now discuss how the positive parameter C can be selectedto control the total number of ρ kij -s that assume the value 1 in (3.31), which is ρ tot = (cid:80) Kk =1 (cid:80) µ − i =1 (cid:80) η i j =1 ρ kij .This is the number of (cid:98) D kij -s that are required to obtain a realization of (cid:101) s ( f, T ). It is immediate that giventhe sigma ﬁeld F T , ρ tot is a N -valued random variable whose expectation is given by: E ( ρ tot |F T ) = K (cid:88) k =1 µ − (cid:88) i =1 η i (cid:88) j =1 E ( P kij |F T ) = K (cid:88) k =1 µ − (cid:88) i =1 η i (cid:88) j =1 E (cid:20)(cid:18) | R kij | Cη i (cid:19) ∧ (cid:12)(cid:12)(cid:12)(cid:12) F T (cid:21) . Using a ∧ b ≤ a and E ( | R kij ||F T ) = (cid:90) t i +1 t i | ∂λ k ( Z α ,β ( x , t )) | dt we obtain E ( ρ tot ) = E ( E ( ρ tot |F T )) ≤ C K (cid:88) k =1 E (cid:32)(cid:90) T | ∂λ k ( Z α ,β ( x , t )) | dt (cid:33) . (3.34)We choose a positive integer M and set C = 1 M K (cid:88) k =1 E (cid:32)(cid:90) T | ∂λ k ( Z α ,β ( x , t )) | dt (cid:33) , (3.35)where the expectation can be approximately estimated using N tau-leap simulations of the dynamics in thetime interval [0 , T ]. Such a choice ensures that ρ tot is bounded above by M on average. In most cases wecan expect that R kij to be close to ∂λ k ( Z α ,β ( x , t i )) τ i and so the choice of η i automatically ensures that | R kij | ≤ Cη i . Hence inequality (3.34) is almost exact and with C chosen as (3.35) we have E ( ρ tot ) ≈ M .Therefore M can be interpreted as the expected number of coupled auxiliary paths (3.32)-(3.33) needed toobtain a realization of (cid:101) s ( f, T ). This parameter is in the hands of the user and it plays the same role as inPPA (see Section 2.2), namely, it allows one to select the trade-oﬀ between the computational cost C ( τ IPA)and the variance V ( τ IPA). A higher value of M reduces the variance while simultaneously increasing thecomputational cost. Hence it is diﬃcult to ascertain the eﬀect of M on the overall estimation cost whichdepends on the product C ( τ IPA) V ( τ IPA) (see (2.10)). Numerical examples suggest that for low values of M , the overall estimation cost decreases gradually with increase in M , but this trend reverses for highervalues of M (see Section 4). More work is needed to examine if this pattern persists more generally andhow one can select the optimal value of M . Note however that τ IPA will provide an unbiased estimatorfor (cid:101) S ( f, T ) (3.21) regardless of the choice of M . Hence the accuracy of τ IPA does not vary much with M ,which is also seen in the numerical examples. 14 .5 The Tau Integral Path Algorithm ( τ IPA)

We now provide a detailed description of the method τ IPA which produces realizations of the randomvariable (cid:101) s ( f, T ) deﬁned by (3.31). Computing the empirical mean (2.6) of these realizations estimates theapproximate parameter sensitivity (cid:101) S ( f, T ). Throughout this section we assume that the function rand ()returns independent samples from the distribution Uniform[0 , τ IPA can be adapted to work with any tau-leap scheme, but for concreteness, we assumethat an explicit tau-leap scheme is used for all the simulations. This means that the current state z andtime t , are suﬃcient to determine the distributions of the next time-step τ and the vector of reaction ﬁrings (cid:101) R = ( (cid:101) R , . . . , (cid:101) R K ) in the time interval [ t, t + τ ). We suppose that a sample from these two distributions canbe obtained using the methods GetTau ( z, t, T ) and GetReactionFirings ( z, τ ) respectively. If we usethe simplest tau-leap scheme given in [20], then reaction ﬁrings can be generated as (cid:101) R k = Poisson ( λ k ( z ) τ ) , (3.36)for k = 1 , . . . , K , where the function Poisson ( r ) generates an independent Poisson random variable withmean r . Once we have the reaction ﬁrings (cid:101) R = ( (cid:101) R , . . . , (cid:101) R K ), the state at time ( t + τ ) is given by z (cid:48) =( z + (cid:80) Kk =1 (cid:101) R k ζ k ) and for any intermediate time-point σ ∈ ( t, t + τ ) the state (cid:98) z can be obtained usingthe “Poisson bridge” interpolation (see [29]). However this interpolation approach is equivalent to setting (cid:98) z = ( z + (cid:80) Kk =1 (cid:101) R (1) k ζ k ) and z (cid:48) = ( (cid:98) z + (cid:80) Kk =1 (cid:101) R (2) k ζ k ), where (cid:101) R (1) = ( (cid:101) R (1)1 , . . . , (cid:101) R (1) K ) and (cid:101) R (2) = ( (cid:101) R (2)1 , . . . , (cid:101) R (2) K )are reaction ﬁring vectors generated according to (3.36) with τ replaced by ( σ − t ) and ( t + τ − σ ) respectively.This idea can be easily generalized to obtain the interpolated states (cid:98) z , . . . , (cid:98) z η at η intermediate times σ , . . . , σ η ∈ ( t, t + τ ) sorted in ascending order, i.e. σ < · · · < σ η .Let Z denote the tau-leap process approximating the reaction dynamics with initial state x . Our ﬁrsttask is to select the normalization parameter C according to (3.35), by estimating the expectation in theformula using N simulations of the process Z . This is done using the function Select-Normalizing-Constant ( x , M , T ) (see Algorithm 2 in Appendix 5.2) where M is the expectednumber of auxiliary paths (3.32)-(3.33) that need to be simulated (see Section 3.4). Once C is chosen, asingle realization of (cid:101) s ( f, T ) can be computed using GenerateSample ( x , T, C ) (Algorithm 1). This methodsimulates the tau-leap process Z and at each leap-time t i , the following happens:1. The next leap size τ i (= τ ) is chosen and the positive integer η i (= η ) is computed.2. The intermediate time-points σ j -s are generated for j = 1 , . . . , η and sorted in ascending order.3. For each j , the vector of reaction ﬁrings (cid:101) R = ( (cid:101) R , . . . , (cid:101) R K ) for the time-interval ( σ j − , σ j ) is computedand the interpolated state (cid:98) z j at time σ j is evaluated. Then for each reaction k the following happens: • The variables R kij (= R ), P kij (= P ) and ρ kij (= ρ ) are generated. The function Bernoulli ( P )generates an independent Bernoulli random variable with expectation P . • If ρ kij = 1 then (cid:98) D kij (see (3.26)) is evaluated using EvaluateCoupledDifference ( (cid:98) z j , (cid:98) z j + ζ k , σ, T ) (see Algorithm 3 inAppendix 5.2) and the sample value is updated according to (3.31). This method independentlysimulates the pair of processes ( Z , Z ) speciﬁed by the split-coupling (3.32)-(3.33) in order tocompute (cid:98) D kij . For simplicity we assume that these simulations are carried out by the sametau-leap scheme which generates reaction ﬁrings according to (3.36).4. Finally, time t is updated to ( t + τ ), reaction ﬁrings for the time-interval [ σ η , t ) are computed and thestate is updated accordingly.Note that in the computation of reaction ﬁrings the propensities are evaluated at z rather than any of theinterpolated states (cid:98) z j . We allow the step-size selection to depend on both the current time t and the ﬁnal time T . This is especially important forsimulating the auxiliary paths that are required to compute the (cid:98) D kij -s in (3.31) (see Sections 3.3 and 3.4). lgorithm 1 Generates one realization of (cid:101) s ( f, T ) according to (3.31) function GenerateSample ( x , T, C ) Set z = x , t = 0 and s = 0 while t < T do Calculate τ = GetTau ( z, t, T ) and set η = max (cid:40)(cid:38) (cid:80) Kk =1 | ∂λ k ( z ) | τC (cid:39) , (cid:41) . For each j = 1 , . . . , η let σ j ← ( t + rand () × τ ). Relabel σ j -s to arrange them in ascending orderas σ < σ < . . . σ η . Also set σ = t and (cid:98) z = z . for j = 1 to η do Set ( (cid:101) R , . . . , (cid:101) R K ) = GetReactionFirings ( z, σ j − σ j − ) and compute the interpolated state (cid:98) z j = (cid:98) z j − + (cid:80) Kk =1 (cid:101) R k ζ k . for k = 1 to K do Set R = ∂λ k ( (cid:98) z j ) τ and ρ = Bernoulli ( P ) with P = (cid:18) | R | Cη (cid:19) ∧ . if ρ = 1 then Update s ← s + (cid:16) RP η (cid:17)

EvaluateCoupledDifference ( (cid:98) z j , (cid:98) z j + ζ k , σ j , T ) end if end for end for Update t ← t + τ Set ( (cid:101) R , . . . , (cid:101) R K ) = GetReactionFirings ( z, t − σ η ) Update z ← (cid:98) z η + (cid:80) Kk =1 (cid:101) R k ζ k end while return s end function In this section we computationally compare six sensitivity estimation methods on many examples. Themethods we consider are the following:1.

Tau Integral Path Algorithm or τ IPA : This is the method described in Section 3.5. The tau-leapscheme we use is the simple Euler method [20] with Poisson reaction ﬁrings (3.36) and uniform step-size τ = τ max . To avoid the possibility of leaping-over the ﬁnal time T at which the sensitivity is to beestimated, we set GetTau ( z, t, T ) = min { τ max , T − t } . The value of τ max will depend on the example being considered and the default value of parameter M is 10.2. Exact Integral Path Algorithm or eIPA : This is the method we obtain by replacing the tau-leapsimulations in τ IPA with the exact simulations performed with Gillespie’s SSA [19]. This replacementcan be easily made by choosing the step-size and the reaction ﬁrings according to Remark 3.1. Moreoverwe need to change the method

EvaluateCoupledDifference to the version given in [26]. Notethat eIPA is a new unbiased method for estimating parameter sensitivity, like the methods in Section2.2. This method is conceptually similar to PPA [26], but unlike PPA, the formula (3.18) underlying τ IPA does not involve summation over the jumps of the process, which makes it more amenable forincorporating tau-leap schemes. 16.

Exact Coupled Finite Diﬀerence or eCFD : This is same as the CFD method in [2].4. Exact Common Reaction Paths or eCRP : This is same as the CRP method in [40].5. Tau Coupled Finite Diﬀerence or τ CFD : This method is the tau-leap version of CFD which hasbeen proposed in [51]. Let ( Z θ , Z θ + h ) be the pair of tau-leap processes that approximate the processes( X θ , X θ + h ), and suppose that at leap time t i their state is ( Z θ ( t i ) , Z θ + h ( t i )) = ( z , z ). If the next step-size is τ , then for every reaction k = 1 , . . . , K , we set the number of ﬁrings ( (cid:101) R θ,k , (cid:101) R θ + h,k ) for this pair ofprocesses as (cid:101) R θ,k = A k + Poisson (( λ k ( z ) − λ k ( z ) ∧ λ k ( z )) τ ) and (cid:101) R θ + h,k = A k + Poisson (( λ k ( z ) − λ k ( z ) ∧ λ k ( z )) τ ), where A k = Poisson (( λ k ( z ) ∧ λ k ( z )) τ ). Such a selection of reaction ﬁringsemulates the CFD coupling. To facilitate comparison, we choose the tau-leap simulation method to bethe same as for τ IPA.6.

Tau Common Reaction Paths or τ CRP : This method can be viewed as the tau-leap version ofCRP where the CRP coupling is emulated by coupling the Poisson random variables that generatethe reaction ﬁrings. Using the same notation as before, if ( Z θ ( t i ) , Z θ + h ( t i )) = ( z , z ) and the nextstep-size is τ , then we set the number of ﬁrings ( (cid:101) R θ,k , (cid:101) R θ + h,k ) as (cid:101) R θ,k = Poisson ( λ k ( z ) τ, k ) and (cid:101) R θ + h,k = Poisson ( λ k ( z ) τ, k ) for every reaction k = 1 , . . . , K . Here we assume that there are K parallel streams of independent Uniform[0 ,

1] random variables (see [40]), and the method

Poisson ( r, k )uses the uniform random variable from the k -th stream for generating the Poisson random variablewith mean r . As for τ CFD, the tau-leap simulation method is the same as for τ IPA.In all the ﬁnite-diﬀerence schemes, we use perturbation-size h = 0 . center the parameter per-turbations to obtain better accuracy. This centering can be easily achieved by substituting θ with ( θ − h/ θ + h ) with ( θ + h/

2) in the expression (2.11) and also in the deﬁnition of the coupled processes. Sincewe use Poisson random variables to generate the reaction ﬁrings for tau-leap simulations, it is possible thatsome state-components become negative during the simulation run. In this paper we deal with this problemrather crudely by setting the negative state-components to 0. We have checked that this does not cause asigniﬁcant loss of accuracy because the state-components become negative very rarely .Note that among the methods considered here, eIPA is the only unbiased sensitivity estimation method.All the other methods are biased either due to a ﬁnite-diﬀerence approximation of the derivative (eCFD andeCRP) or due to tau-leap approximation of the sample paths ( τ IPA) or due to both these reasons ( τ CFDand τ CRP). In the examples, we apply each sensitivity estimation method X with a sample-size of N = 10 ,and compute the estimator mean (cid:98) µ N (2.6), the standard deviation (cid:98) σ N (2.9), the relative standard deviationRSD( X ) and the computational cost per sample C ( X ) (see Section 2). Assume that the exact sensitivityvalue is s which is known. We compare the diﬀerent estimation methods using the following two quantities- the percentage relative error ( RE ) deﬁned byRE = (cid:12)(cid:12)(cid:12)(cid:12) (cid:98) µ N − s s (cid:12)(cid:12)(cid:12)(cid:12) × , (4.37)and the RSD adjusted computational cost ( RSDCC ) deﬁned byRSDCC = (RSD( X )) C ( X ) . (4.38)The ﬁrst quantity RE measures the accuracy of a method, while the second quantity RSDCC determinesthe overall computational time that will be required by the method to yield an estimate with the desiredstatistical precision (see (2.10)).Our numerical results will show that the exact schemes (eIPA, eCFD and eCRP) usually have a higherRSDCC than their tau-leap counterparts ( τ IPA, τ CFD and τ CRP), but expectedly their RE is lower.Generally the RE for eIPA is smaller than both eCFD and eCRP because of its unbiasedness and thisadvantage in accuracy often persists when we compare τ IPA with τ CFD and τ CRP. It can be seen that inmost of the cases, the sample variance V ( X ) or the estimator standard deviation (2.9), remain of similarmagnitude, when we switch from an exact scheme to its tau-leap version (see Appendix 5.2). This supportsour claim in Section 2.3, that substituting exact paths with tau-leap trajectories allows one to trade-oﬀ17ias with computational costs, and this trade-oﬀ relationship is somewhat “orthogonal” to other trade-oﬀrelationships shown in Table 1.In all the examples below, the propensity functions λ k -s for all the reactions have the mass-action form[3] unless stated otherwise. Also ∂ always denotes the partial-derivative w.r.t. the designated sensitiveparameter θ . Our ﬁrst example is a simple birth-death model in which a single species S is created and destroyed accordingto the following two reactions: ∅ θ −→ S θ −→ ∅ . Let θ = 10, θ = 0 . θ = θ . Let ( X ( t )) t ≥ be the Markovprocess representing the reaction dynamics. Assume that X (0) = 0. For f ( x ) = x we wish to estimate S θ ( f, T ) = ∂ E ( f ( X ( T ))) = ∂ E ( X ( T ))for T = 5 and T = 10. For this example, we set τ max = 0 .

5. For each T we estimate the sensitivity usingall the six methods and the results are displayed in Table 3 in Appendix 5.2. For this network we cancompute the sensitivity S θ ( f, T ) exactly as the propensity functions are aﬃne. These exact values are statedin the caption of Table 3, and they allow us to compute the RE of a method according to (4.37). We alsocompute the RSDCC for each method using (4.38), and we compare these RE and RSDCC values for allthe methods in Figure 1 A . From these comparisons we can make the following observations: 1) The exactmethods are typically more accurate than the tau-leap methods but they are usually more computationallydemanding. 2) For T = 5, eCFD/eCRP are far more accurate than τ CFD/ τ CRP suggesting that the twosources of bias (ﬁnite-diﬀerence and tau-leap approximations) are additive in nature. However the same isnot true for T = 10. 3) For both the cases T = 5 and T = 10, τ IPA outperforms τ CFD/ τ CRP in terms ofaccuracy even though it is slightly more computationally expensive. Same is true when we compare eIPAwith eCFD/eCRP.In Figure 1 B we numerically analyze the performance of τ IPA w.r.t. its two key parameters - theexpected number of auxiliary paths M and the maximum tau-leap step-size τ max . We see that RE is fairlyinsensitive to variations in M while RSDCC ﬁrst decreases with M up to a certain point, and then itstarts increasing with M . As we are using a ﬁrst-order explicit tau-leap scheme, it is unsurprising thatRE increases almost linearly with τ max . However, importantly, RSDCC decreases exponentially with τ max ,which makes it possible to use tau-leap simulations to trade-oﬀ a small amount of accuracy for a large gainin computational eﬃciency with τ IPA.Observe that if we scale the production rate θ by the system-size or volume parameter V , then the concentration process , derived by dividing the copy-number counts X ( t ) by V , converges to a deterministicODE limit as V → ∞ (see Chapter 11 in [14]). Often it is of interest to determine how the performanceof various sensitivity estimation methods scales with the volume parameter V . We investigate this issue forthe exact schemes (eIPA, eCFD and eCRP) in Figure 2, by numerically examining the dependence of theirRSD, RSDCC and RE on V . Here we set the expected number of auxiliary paths M for eIPA to be equalto V . Note that RSD for ﬁnite-diﬀerence schemes (eCFD/eCRP) scales like 1 / √ V as was proved in [45] andconsequently their RSDCC is of order 1, because the computational time per sample, which is proportional tothe number of reaction events per unit time-interval, is of order V . Similar to these ﬁnite-diﬀerence schemesthe RSD for eIPA also scales like 1 / √ V , but its RSDCC is of order V as its computational time per sampleis of order V because to generate each sample for eIPA, M = V auxiliary paths need to be simulated inaddition to the main sample path. This computational disadvantage of eIPA is compensated by the factthat accuracy of eIPA improves with volume (i.e. RE decreases with volume), while for the ﬁnite-diﬀerenceschemes it is almost a constant. These numerical results suggest that the computational eﬃciency of eIPAscales with volume V in the same way as it does for the CGT method (see Section 2.2) whose RSD hasbeen shown to be of order 1 w.r.t. volume V (see [45]). Despite this similarity in volume scaling, eIPA All the computations in this paper were performed using C++ programs on an Apple machine with the 2.9 GHz Intel Corei5 processor. = 5 T = 10 AB Unbiased Unbiased eIPA IPA eCFD CFD eCRP CRP012345 R E % -7 -6 -5 R S DCC

RE%RSDCC eIPA IPA eCFD CFD eCRP CRP00.511.52 R E % -7 -6 -5 R S DCC

RE%RSDCC M R E % -6 -5 -4 R S DCC

RE%RSDCC max R E % -6 -5 -4 R S DCC

RE%VACC

Figure 1:

Birth-death model:

Panel A compares the various sensitivity estimation methods in terms ofthe percentage relative error (RE) (calibrated with the left y-axis in linear scale) and the relative standarddeviation adjusted computational cost (RSDCC) (calibrated with the right y-axis in log-scale). The sensi-tivities are estimated for T = 5 and 10 using N = 10 samples. The results show how tau-leap methodstrade-oﬀ accuracy with reduction in computational costs. Note that τ IPA yields more accurate sensitivityestimates than τ CFD/ τ CRP even though the associated computational costs are slightly higher. In panel B , we study how the performance of τ IPA depends on parameters M (expected number of auxiliary paths)and τ max (maximum tau-leap step-size) for the case T = 10. Observe that as M increases, RE does notchange much but RSDCC behaves like a convex function with minimum around M = 20. As τ max increases,RE increases linearly but RSDCC drops exponentially making τ IPA a viable method for trading oﬀ accuracywith computational eﬃciency for sensitivity estimation.is still a preferable unbiased method when compared to the CGT method, as its estimator variance doesnot become unbounded as the magnitude of the sensitive parameter approaches zero (see Section 2.2). Thevolume-scaling analysis presented here can also be performed for the tau-leap schemes by parameterizing thestep-size τ max by volume V as discussed in Section 3.3. We expect the results to be qualitatively similar tothe exact schemes, because, as mentioned previously, it is observed that the sample variance remains similarwhen we switch from an exact scheme to its tau-leap version (see Appendix 5.2). However this needs to beinvestigated in detail in a future work. Our second example considers the

Repressilator network given in [13], which consists of three mutuallyrepressing gene-expression modules (say 1,2 and 3). Repression occurs at the level of transcription, i.e.19 Volume (V) -3 -2 -1 R S D eIPAeCFDeCRP Volume (V) -6 -5 -4 -3 -2 -1 R S DCC eIPAeCFDeCRP Volume (V) R E % eIPAeCFDeCRP Figure 2:

Birth-death model:

In this ﬁgure we examine how the performance of the exact schemes(eIPA, eCFD and eCRP) varies with the system-size represented by volume V . We study the case T = 10 byreplacing the production rate θ by θ V . For eIPA we set the expected number of auxiliary paths as M = V .The three plots compare the three exact schemes in terms of their relative standard deviation (RSD), therelative standard deviation adjusted computational cost (RSDCC) and the percentage relative error (RE).Note that the x -axis for volume V is in log-scale, and y -axis for RSD and RSDCC is in log-scale but forRE it is in linear-scale. Observe that RSDCC is of order V for eIPA but it is of order 1 for eCFD/eCRP.However for eIPA, the accuracy increases with V (i.e. RE decreases with V ), while it remains the same foreCFD/eCRP.production of the three mRNAs M , M and M , and it is carried out by the corresponding protein molecules P , P and P in a cyclic pattern. In other words, protein P i represses the transcription of mRNA M i − ,where we identify M with M . The repression mechanism is modeled with a nonlinear Hill function. Therepressilator network consists of 6 biomolecular species and 12 reactions described in Table 2.No. Reaction Propensity1 ∅ −→ M λ ( x ) = 1 + 100 / (1 + x α )2 ∅ −→ M λ ( x ) = 1 + 100 / (1 + x α )3 ∅ −→ M λ ( x ) = 1 + 100 / (1 + x α )4 M −→ ∅ λ ( x ) = x M −→ ∅ λ ( x ) = x M −→ ∅ λ ( x ) = x M −→ M + P λ ( x ) = 50 x M −→ M + P λ ( x ) = 50 x M −→ M + P λ ( x ) = 50 x P −→ ∅ λ ( x ) = γ x P −→ ∅ λ ( x ) = γ x P −→ ∅ λ ( x ) = γ x Table 2: Reactions for the

Repressilator network [13]. Here x = ( x , . . . , x ) denotes the copy-numbers ofthe 6 network species ordered as M , M , M , P , P and P .We set the Hill coeﬃcient α i for the transcription of each mRNA to be 1 (see reactions 1-3 in Table 2) andthe degradation rate constant γ i for each protein to be 0 . X ( t )) t ≥ bethe N -valued Markov process representing the reaction dynamics, under the species ordering described inthe caption of Table 2. We assume that X (0) = (0 , , , , ,

0) and deﬁne f : N → R by f ( x , . . . , x ) = x .At T = 10, our goal is to estimate S θ ( f, T ) = ∂ E ( f ( X ( T ))) = ∂ E ( X ( T )) , (4.39)for θ = α , α , α , γ , γ , γ . These values measure the sensitivity of the mean of protein P populationat time T = 10 with respect to the Hill coeﬃcients α i -s and the protein degradation rates γ j -s. For thisexample, we set τ max = 0 .

01. 20or each θ we estimate the sensitivity using all the six methods and the results are displayed in Table 5in Appendix 5.2. Unlike the previous example, we cannot compute the sensitivity values exactly because ofnonlinearity of some of the propensity functions. So we obtain accurate approximations of these values usingthe unbiased estimator (eIPA) with a large sample size ( N = 10 ) and they are provided in the caption ofTable 5. With these values we can compute the REs (4.37), which are then compared along with RSDCCsfor all the methods in Figure 3. The results vary with the choice of the sensitive parameter θ , but one canclearly see that τ IPA can be several times more accurate than τ CFD / τ CRP even though its RSDCC is ofa similar magnitude. This is especially observable for cases θ = α , α and γ . Most notably for the case θ = α , the RE for ﬁnite-diﬀerence schemes is around 800%, while it is 1 .

3% for eIPA and 5% for τ IPA.

Unbiased 𝜃 = 𝛂 𝜃 = 𝛂 𝜃 = 𝛂 Unbiased Unbiased 𝜃 = 𝛄 𝜃 = 𝛄 𝜃 = 𝛄 Unbiased Unbiased Unbiased eIPA IPA eCFD CFD eCRP CRP02004006008001000 R E % -3 -2 -1 R S DCC

RE%RSDCC eIPA IPA eCFD CFD eCRP CRP0123456 R E % -4 -3 -2 -1 R S DCC

RE%RSDCC eIPA IPA eCFD CFD eCRP CRP050100150 R E % -2 -1 R S DCC

RE%RSDCC eIPA IPA eCFD CFD eCRP CRP00.511.522.533.5 R E % -3 -2 -1 R S DCC

RE%RSDCC eIPA IPA eCFD CFD eCRP CRP020406080100 R E % -2 -1 R S DCC

RE%RSDCC eIPA IPA eCFD CFD eCRP CRP0510152025 R E % -2 -1 R S DCC

RE%RSDCC

Figure 3:

Repressilator Network:

This ﬁgure compares the various sensitivity estimation methods interms of the percentage relative error (RE) (calibrated with the left y-axis in linear scale) and the relativestandard deviation adjusted computational cost (RSDCC) (calibrated with the right y-axis in log-scale).Thesensitivities are estimated for θ = α , α , α , γ , γ and γ using N = 10 samples. Observe that for someparameters τ IPA is several times more accurate than τ CFD/ τ CRP.

As our last example we look at a simple network with nonlinear propensity functions. Consider the networkof a genetic toggle switch proposed by Gardner et. al. [17]. This network has two species U and V thatinteract through the following four reactions ∅ λ −→ U , U λ −→ ∅ , ∅ λ −→ V and V λ −→ ∅ , where the propensity functions λ i -s are given by λ ( x , x ) = α x β , λ ( x , x ) = x , λ ( x , x ) = α x γ and λ ( x , x ) = x . In the above expressions, x and x denote the number of molecules of U and V respectively. We set α = 50, α = 16, β = 2 . γ = 1. Let ( X ( t )) t ≥ be the N -valued Markov process representing the21eaction dynamics with initial state ( X (0) , X (0)) = (0 , T = 10 and f ( x ) = x , our goal is toestimate S θ ( f, T ) = ∂ E ( f ( X ( T ))) = ∂ E ( X ( T )) , for θ = α , α , β and γ . In other words, we would like to measure the sensitivity of the mean of the number of U molecules at time T = 10, with respect to all the model parameters. For this example, we set τ max = 0 . A .As in the previous example, we estimate the true sensitivity values using the unbiased estimator (eIPA)with a large sample size ( N = 10 ). These approximate values are given in the caption of Table 4 andthey were used in computing the relative errors (4.37) for Figure 4. Here we ﬁnd that eIPA outperformseCFD/eCRP both in terms of accuracy and computational eﬃciency for all the parameters. Similarly τ IPAis computationally more eﬃcient than τ CFD/ τ CRP for all the parameters, but except for the case θ = α ,its accuracy is similar to τ CFD/ τ CRP. In Figure 4 B we numerically examine how the performance of τ IPAis aﬀected by the parameter M , for a couple of cases. As in Section 4.1, we ﬁnd this eﬀect to be quite smallfor RE but RSDCC ﬁrst decreases with M and then increases. Unbiased 𝜃 = 𝛂 𝜃 = β 𝜃 = 𝛄 Unbiased UnbiasedUnbiased A 𝜃 = 𝛂 B 𝜃 = 𝛂 𝜃 = 𝛄 eIPA IPA eCFD CFD eCRP CRP024681012 R E % -3 -2 -1 R S DCC

RE%RSDCC eIPA IPA eCFD CFD eCRP CRP051015202530 R E % -4 -3 -2 -1 R S DCC

RE%RSDCC eIPA IPA eCFD CFD eCRP CRP01020304050 R E % -4 -3 -2 -1 R S DCC

RE%RSDCC eIPA IPA eCFD CFD eCRP CRP024681012 R E % -4 -3 -2 R S DCC

RE%RSDCC M R E % -4 -3 -2 -1 R S DCC

RE%RSDCC M R E % -4 -3 -2 R S DCC

RE%RSDCC

Figure 4:

Genetic toggle switch:

Estimation of parameter sensitivities for stochastic reaction networks in an important and diﬃcult problem.The main source of diﬃculty is that all the estimation methods rely on exact simulations of the reactiondynamics performed using Gillespie’s SSA [19] or its variants [18, 4]. It is well-known that these simulationalgorithms are computationally very demanding as they track each and every reaction event which can be verycumbersome. This issue represents the main bottleneck in the use of sensitivity analysis for systems modeledas stochastic reaction networks. The aim of this paper is to develop a method, called

Tau Integral PathAlgorithm ( τ IPA), that feasibly deals with this issue by requiring only approximate tau-leap simulationsof the reaction dynamics, and still providing provably accurate estimates for the sensitivity values. Thismethod is based on an explicit integral representation for parameter sensitivity that was derived from theformula given in [25]. Furthermore, by replacing the tau-leap simulation scheme in τ IPA with an exactsimulation scheme like SSA, we obtain a new unbiased method (called eIPA) for sensitivity estimation, thatcan serve as the natural limit of τ IPA when the step-size τ gets smaller and smaller.Using computational examples we compare τ IPA with tau-leap versions of the ﬁnite-diﬀerence schemes[2, 40, 51] that are commonly employed for sensitivity estimation. We ﬁnd that in many cases, τ IPAoutperforms these tau-leap ﬁnite-diﬀerence schemes in terms of both accuracy and computational eﬃciency.This makes τ IPA an appealing method for sensitivity analysis of stochastic reaction networks, where theexact dynamical simulations are computationally infeasible and tau-leap approximations become necessary.As we argue in Section 2.3, tau-leap simulations provide a natural way to trade-oﬀ estimator bias withgains in computational speed. Therefore it would be of fundamental importance to extend the ideas in thispaper and try to maximize the computational gains from tau-leap simulations while sacriﬁcing the minimum amount of accuracy. In this context, we now mention two possible directions for future research. The methodwe proposed here, τ IPA, can work with any underlying tau-leap simulation scheme, but for simplicity weexamined it with the most basic tau-leap scheme i.e. an explicit Euler method with a constant (deterministic)step-size and Poissonian reaction ﬁrings [20]. As this tau-leap scheme has several drawbacks (see [21]), itis very likely that τ IPA can yield much better results if a more sophisticated tau-leap scheme is employed,possibly with random step-sizes [11, 5, 32], or with Binomial leaps [43] or using implicit step-size selection[36]. We shall explore these issues in a future paper. Note that τ IPA essentially converts the problemof estimating parameter sensitivities to the problem of estimating a collection of expected values of theprocess with tau-leap simulations. The latter problem can be eﬃciently handled using multilevel strategies,where estimators are constructed for a range of τ -values, and are suitably coupled to simultaneously reducethe estimator’s bias and variance [7, 30, 32]. A promising approach would be to integrate these multilevelestimators with τ IPA to improve its accuracy and computational eﬃciency.

Appendix

Proof. [Proof of Theorem 3.4] Let {F t } be the ﬁltration generated by the process( X θ ( t )) t ≥ and let σ i be its i -th jump time for i = 1 , , . . . . We deﬁne σ = 0 for convenience. Since theprocess ( X θ ( t )) t ≥ is constant between consecutive jump times we can write E (cid:32)(cid:90) T ∂λ k ( X θ ( t ) , θ ) ∂θ ∆ ζ k f ( X θ ( t )) dt (cid:33) = ∞ (cid:88) i =0 E (cid:18) ∂λ k ( X θ ( σ i ) , θ ) ∂θ ∆ ζ k f ( X θ ( σ i ))( σ i +1 ∧ T − σ i ∧ T ) (cid:19) = ∞ (cid:88) i =0 E (cid:18) E (cid:18) ∂λ k ( X θ ( σ i ) , θ ) ∂θ ∆ ζ k f ( X θ ( σ i ))( σ i +1 ∧ T − σ i ∧ T ) (cid:12)(cid:12)(cid:12)(cid:12) F σ i (cid:19)(cid:19) = E (cid:32) ∞ (cid:88) i =0: σ i

Proof. [Proof of Theorem 3.6] For each k = 1 , . . . , K deﬁne g k , h k by g k ( x, t ) = ∂λ k ( x )∆ ζ k Ψ( xk, f, T − t ) and h k ( x, t ) = ∂λ k ( x )∆ ζ k (cid:101) Ψ α ,β ( t ) ( x k , f, T − t ). Without loss of gener-ality, we can assume that there exists a C > { ∂λ k ( x ) , f ( x ) | k = 1 , . . . , K } ≤ C (1 + (cid:107) x (cid:107) p ) , ∀ x ∈ N d . Then due to Lemma 3.3 we obtainsup t ∈ [0 ,T ] | h k ( x, t ) − g k ( x, t ) |≤ ∂λ k ( x ) CC ( p, T, α ) (cid:16) (1 + (cid:107) x (cid:107) ξ ( p ) ) + (1 + (cid:107) x + ζ k (cid:107) ξ ( p ) ) (cid:17) τ γ max ≤ C C ( p, T, α )(1 + (cid:107) x (cid:107) p ) (cid:16) (1 + (cid:107) x (cid:107) ξ ( p ) ) + (1 + (cid:107) x + ζ k (cid:107) ξ ( p ) ) (cid:17) τ γ max ≤ c ( p ) C C ( p, T, α ) (cid:16) (cid:107) x (cid:107) ( p + ξ ( p )) (cid:17) τ γ max , (5.43)25here c ( p ) is a constant that depends only on p as well as ζ , . . . , ζ K . Lemma 3.3 also shows thatsup t ∈ [0 ,T ] | h k ( x, t ) | ≤ ∂λ k ( x ) CC ( p, T, α ) ((1 + (cid:107) x (cid:107) p ) + (1 + (cid:107) x + ζ k (cid:107) p )) ≤ c ( p ) C C ( p, T, α )(1 + (cid:107) x (cid:107) p ) (5.44)and sup t ∈ [0 ,T ] | g k ( x, t ) | ≤ c ( p ) C C ( p, T )(1 + (cid:107) x (cid:107) p ), where c ( p ) is again a constant that depends only on p and ζ , . . . , ζ K .From (5.44) and Lemma 3.3 it follows thatsup t ∈ [0 ,T ] | E ( h k ( Z α ,β ( x , t ) , t )) − E ( h k ( X ( t ) , t )) | (5.45) ≤ c ( p ) C C ( p, T, α ) C (2 p, T, α ) (cid:16) (cid:107) x (cid:107) ξ (2 p ) (cid:17) τ γ max . Moreover from (5.43), we get E ( | h k ( X ( t ) , t ) − g k ( X ( t ) , t ) | ) ≤ c ( p ) C C ( p, T, α )(1 + E ( (cid:107) X ( t ) (cid:107) ( p + ξ ( p )) )) τ γ max , and hence using Assumption 2, we obtainsup t ∈ [0 ,T ] E ( | h k ( X ( t ) , t ) − g k ( X ( t ) , t ) | ) (5.46) ≤ c ( p ) C C ( p, T, α ) C ( p + ξ ( p ) , T ) (cid:16) (cid:107) x (cid:107) p + ξ ( p ) (cid:17) τ γ max . Note that (cid:12)(cid:12)(cid:12) (cid:101) S ( f, T ) − S ( f, T ) (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) K (cid:88) k =1 (cid:90) T ( E ( h k ( Z α ,β ( x , t ) , t )) − E ( g k ( X ( t ) , t ))) dt (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ K (cid:88) k =1 (cid:12)(cid:12)(cid:12) (cid:90) T E ( h k ( Z α ,β ( x , t ) , t )) dt − (cid:90) T E ( g k ( X ( t ) , t )) dt (cid:12)(cid:12)(cid:12) ≤ K (cid:88) k =1 (cid:90) T | E ( h k ( Z α ,β ( x , t ) , t )) − E ( h k ( X ( t ) , t )) | dt + K (cid:88) k =1 (cid:90) T | E ( h k ( X ( t ) , t )) − E ( g k ( X ( t ) , t )) | dt. Using (5.45) and (5.46) we obtain the bound (cid:12)(cid:12)(cid:12) (cid:101) S ( f, T ) − S ( f, T ) (cid:12)(cid:12)(cid:12) ≤ KT c ( p ) C C ( p, T, α ) C (2 p, T, α ) (cid:16) (cid:107) x (cid:107) ξ (2 p ) (cid:17) τ γ max + KT c ( p ) C C ( p, T, α ) C ( p + ξ ( p ) , T ) (cid:16) (cid:107) x (cid:107) p + ξ ( p ) (cid:17) τ γ max , which proves the theorem. (cid:3) References [1]

U. Alon , An introduction to systems biology : design principles of biological circuits , Chapman &Hall/CRC mathematical and computational biology series, Chapman & Hall/CRC, 2007. 126

IPA τ IPA T Mean Std Dev RE% RSDCC Mean Std Dev RE% RSDCC5 -90.079 0.093 0.139 0.379E-5 -90.938 0.078 0.813 0.121E-510 -264.5 0.309 0.099 0.97E-5 -266.34 0.243 0.793 0.247E-5eCFD τ CFD T Mean Std Dev RE% RSDCC Mean Std Dev RE% RSDCC5 -90.632 0.088 0.4746 0.078E-5 -86.456 0.089 4.155 0.033E-510 -268.77 0.142 1.716 0.054E-5 -268.214 0.146 1.503 0.021E-5eCRP τ CRP T Mean Std Dev RE% RSDCC Mean Std Dev RE% RSDCC5 -90.749 0.097 0.604 0.343E-5 -86.481 0.098 4.128 0.343E-510 -268.82 0.169 1.734 0.152E-5 -267.92 0.173 1.393 0.131E-5

Table 3:

Birth-death model:

Sensitivity estimation results for T = 5 ,

10. For all the methods, N = 10 areused to estimate the following quantities - the estimator mean (2.6), the standard deviation (2.9), the relativeerror (RE) percentage (4.37) and the relative standard deviation adjusted computation cost (RSDCC) (4.38)in seconds. The exact sensitivity values are − .

204 for T = 5 and − .

241 for T = 10. eIPA τ IPA θ Mean Std Dev RE% RSDCC Mean Std Dev RE% RSDCC α α -2.133 0.0132 0.663 0.0021 -2.3968 0.0148 13.087 0.0008 β -5.924 0.0419 1.144 0.0020 -8.5372 0.0562 42.456 0.0008 γ τ CFD θ Mean Std Dev RE% RSDCC Mean Std Dev RE% RSDCC α α -2.007 0.267 5.305 0.3219 -2.734 0.0991 29.011 0.0066 β -5.865 0.4535 2.1339 0.1053 -8.787 0.1813 46.617 0.0021 γ τ CRP θ Mean Std Dev RE% RSDCC Mean Std Dev RE% RSDCC α α -1.999 0.1306 5.701 0.0823 -2.415 0.1109 13.9646 0.0254 β -6.21 0.1777 3.625 0.0161 -8.853 0.2198 47.7203 0.0074 γ Table 4:

Genetic toggle switch:

Sensitivity estimation results w.r.t. all the model parameters α , α , β and γ . For all the methods, N = 10 are used to estimate the following quantities - the estimator mean (2.6),the standard deviation (2.9), the relative error (RE) percentage (4.37) and the relative standard deviationadjusted computation cost (RSDCC) (4.38) in seconds. The true sensitivity values are approximately 1 . ± .

009 for θ = α , − . ± .

01 for θ = α , − . ± .

035 for θ = β and 54 . ± .

133 for θ = γ . Thesevalues are estimated with eIPA using 10 samples and they are expressed in the form s ± l , which signiﬁesthat the 99% conﬁdence interval is ( s − l, s + l ).[2] D. Anderson , An eﬃcient ﬁnite diﬀerence method for parameter sensitivities of continuous timemarkov chains , SIAM: Journal on Numerical Analysis, 50 (2012). 1, 2.1, 2.2, 1, 3.4, 3, 5[3]

D. Anderson and T. Kurtz , Continuous time Markov chain models for chemical reaction networks ,in Design and Analysis of Biomolecular Circuits, H. Koeppl, G. Setti, M. di Bernardo, and D. Densmore,eds., Springer-Verlag, 2011. 1, 2, 2.2, 4[4]

D. F. Anderson , A modiﬁed next reaction method for simulating chemical systems with time dependentpropensities and delays , The Journal of Chemical Physics, 127 (2007), 214107. 2, 527

IPA τ IPA θ Mean Std Dev RE% RSDCC Mean Std Dev RE% RSDCC α -67.73 1.17 1.31 1.6801 -65.2 0.8 5 0.2886 α -2982.2 10.6 0.078 0.0193 -2821.8 7.66 5.3 0.0053 α γ γ -119.38 1.01 0.13 0.4097 -90.78 0.74 24.1 0.1251 γ -30.38 7.82 8.98 104.45 -23.45 2.97 15.75 11.484eCFD τ CFD θ Mean Std Dev RE% RSDCC Mean Std Dev RE% RSDCC α -633.79 6.21 823.5 0.0334 -621.15 2.1 805.1 0.0017 α -2987.1 10.01 0.24 0.0039 -2891.5 7.16 2.97 0.0009 α γ γ -51.61 10.7 56.8 14.764 -22.8 1.16 80.9 0.3845 γ -31.74 4.98 13.85 8.407 -24.61 1.42 11.72 0.4871eCRP τ CRP θ Mean Std Dev RE% RSDCC Mean Std Dev RE% RSDCC α -648.1 2.38 844.3 0.0039 -620.9 2.1 804.7 0.0028 α -3076.6 10.5 3.2 0.0033 -2897.2 7.5 2.8 0.0016 α γ γ -41.29 0.6 65.5 0.0602 -21.91 1.16 81.7 0.6639 γ -33.98 0.52 21.88 0.0666 -23.78 0.91 14.7 0.3494 Table 5:

Repressilator model:

Sensitivity estimation results w.r.t. model parameters α , α , α , γ , γ and γ . For all the methods, N = 10 are used to estimate the following quantities - the estimator mean (2.6),the standard deviation (2.9), the relative error (RE) percentage (4.37) and the relative standard deviationadjusted computation cost (RSDCC) (4.38) in seconds. The exact sensitivity values are approximately − . ± θ = α , − . ± θ = α , 145 . ± . θ = α , 257 . ± . θ = γ , − . ± . θ = γ and − . ± . θ = γ . These values are estimated with eIPA using10 samples and they are expressed in the form s ± l , which signiﬁes that the 99% conﬁdence interval is( s − l, s + l )[5] D. F. Anderson , Incorporating postleap checks in tau-leaping , The Journal of Chemical Physics, 128(2008), 054103. 1, 2.3, 3.1, 5[6]

D. F. Anderson, A. Ganguly, and T. G. Kurtz , Error analysis of tau-leap simulation methods ,Ann. Appl. Probab., 21 (2011), pp. 2226–2262. 1, 2.3, 3.3[7]

D. F. Anderson and D. J. Higham , Multi-level monte carlo for continuous time markov chains, withapplications to biochemical kinetics , SIAM Multiscale Modeling and Simulation, 10 (2012), pp. 146–179.1, 5[8]

J. Bascompte , Structure and dynamics of ecological networks , Science, 329 (2010), pp. 765–766. 1[9]

C. R. Bruno A. Walther, Joslin L. Moore , The concepts of bias, precision and accuracy, andtheir use in testing the performance of species richness estimators, with a literature review of estimatorperformance , Ecography, 28 (2005), pp. 815–829. 2[10]

Y. Cao, D. Gillespie, and L. Petzold , The slow-scale stochastic simulation algorithm , Journal ofChemical Physics, 122 (2005), pp. 1–18.[11]

Y. Cao, D. T. Gillespie, and L. R. Petzold , Eﬃcient step size selection for the tau-leapingsimulation method , The Journal of Chemical Physics, 124 (2006). 1, 2.3, 3.1, 528 lgorithm 2

Estimates the normalizing constant C using N simulations of the tau-leap process Z function Select-Normalizing-Constant ( x , M , T ) Set S = 0 for i = 1 to N do Set z = x and t = 0 while t < T do Calculate τ = GetTau ( z, t, T ) for k = 1 to K do Update S ← S + τ | ∂λ k ( z ) | end for Update t ← t + τ Set ( (cid:101) R , . . . , (cid:101) R K ) = GetReactionFirings ( z, τ ). Set z ← z + (cid:80) Kk =1 ζ k (cid:101) R k . end while end for return S/ ( N M ) end functionAlgorithm 3 Used to evaluate (cid:98) D ki given by (3.26) function EvaluateCoupledDifference ( z , z , t, T ) while z (cid:54) = z AND t < T do Set τ = GetTau ( z , t, T ), τ = GetTau ( z , t, T ) and τ = τ ∧ τ for k = 1 to K do Set A k = λ k ( z ) ∧ λ k ( z ), A k = λ k ( z ) − A k and A k = λ k ( z ) − A k Set (cid:101) R ki = Poisson ( A ki τ ) for i = 1 , , Update z ← z + (cid:101) R k ζ k + (cid:101) R k ζ k Update z ← z + (cid:101) R k ζ k + (cid:101) R k ζ k Update t ← t + τ end for end while return f ( z ) − f ( z ) end function [12] W. E, D. Liu, and E. Vanden-Eijnden , Nested stochastic simulation algorithms for chemical kineticsystems with multiple time scales , J. Comput. Phys., 221 (2007), pp. 158–180.[13]

M. B. Elowitz and S. Leibler , A synthetic oscillatory network of transcriptional regulators , Nature,403 (2000), pp. 335–338. 4.2, 2[14]

S. N. Ethier and T. G. Kurtz , Markov processes , Wiley Series in Probability and MathematicalStatistics: Probability and Mathematical Statistics, John Wiley & Sons Inc., New York, 1986. Charac-terization and convergence. 2, 2, 3.4, 4.1[15]

X.-j. Feng, S. Hooshangi, D. Chen, R. Li, Genyuan; Weiss, and H. Rabitz , Optimizing geneticcircuits by global sensitivity analysis , Biophysical journal, 87 (2004), pp. 2195 – 2202. 1[16]

M. Fink and D. Noble , Markov models for ion channels: Versatility versus identiﬁability and speed ,Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences,367 (2009), pp. 2161–2179. 1[17]

T. S. Gardner, C. R. Cantor, and J. J. Collins , Construction of a genetic toggle switch inescherichia coli , Nature, 403 (2000), pp. 339–342. 1, 4.3[18]

M. A. Gibson and J. Bruck , Eﬃcient exact stochastic simulation of chemical systems with manyspecies and many channels , The Journal of Physical Chemistry A, 104 (2000), pp. 1876–1889. 2, 52919]

D. T. Gillespie , Exact stochastic simulation of coupled chemical reactions , The Journal of PhysicalChemistry, 81 (1977), pp. 2340–2361. 1, 1, 2, 2.1, 3.1, 2, 5[20]

D. T. Gillespie , Approximate accelerated stochastic simulation of chemically reacting systems , TheJournal of Chemical Physics, 115 (2001), pp. 1716–1733. 1, 2.3, 3.1, 3.5, 1, 5[21]

D. T. Gillespie , Stochastic simulation of chemical kinetics , Annual Review of Physical Chemistry, 58(2007), pp. 35–55. 1, 5[22]

P. W. Glynn , Likelihood ratio gradient estimation for stochastic systems , Commun. ACM, 33 (1990),pp. 75–84. 1, 2.2[23]

R. Gunawan, Y. Cao, and F. Doyle , Sensitivity analysis of discrete stochastic systems , BiophysicalJournal, 88 (2005), pp. 2530–2540. 1, 2.1[24]

A. Gupta, C. Briat, and M. Khammash , A scalable computational framework for establishing long-term behavior of stochastic reaction networks , PLoS Comput Biol, 10 (2014), p. e1003669. 3.1[25]

A. Gupta and M. Khammash , Unbiased estimation of parameter sensitivities for stochastic chemicalreaction networks , SIAM Journal on Scientiﬁc Computing, 35 (2013), pp. A2598–A2620. 1, 2.2, 1, 5,5.1[26]

A. Gupta and M. Khammash , An eﬃcient and unbiased method for sensitivity analysis of stochasticreaction networks , Journal of The Royal Society Interface, 11 (2014), p. 20140979. 1, 2.1, 2.2, 1, 2[27] A. Gupta, M. Khammash, et al. Sensitivity analysis for stochastic chemical reaction networks withmultiple time-scales.

Electronic Journal of Probability , 19, 2014. 3.4[28]

H. Hethcote , The mathematics of infectious diseases , SIAM Review, 42 (2000), pp. 599–653. 1[29]

J. Karlsson and R. Tempone , Towards automatic global error control: Computable weak errorexpansion for the tau-leap method , Monte Carlo Methods Appl., 17 (2011), pp. 233–278. 1, 3.1, 3.5[30]

C. Lester, C. A. Yates, M. B. Giles, and R. E. Baker , An adaptive multi-level simulationalgorithm for stochastic biological systems , The Journal of Chemical Physics, 142 (2015), 024113. 1, 5[31]

T. Li , Analysis of explicit tau-leaping schemes for simulating chemically reacting systems , MultiscaleModel. Simul., 6 (2007), pp. 417–436 (electronic). 1, 2.3, 3.3[32]

A. Moraes, R. Tempone, and P. Vilanova , Hybrid chernoﬀ tau-leap , Multiscale Modeling &Simulation, 12 (2014), pp. 581–615. 1, 2.3, 3.1, 5[33]

S. Plyasunov and A. Arkin , Eﬃcient stochastic sensitivity analysis of discrete event systems , Journalof Computational Physics, 221 (2007), pp. 724–738. 1, 2.2[34]

M. Rathinam , Moment growth bounds on continuous time markov processes on non-negative integerlattices , Quart. Appl. Math., 73 (2015), pp. 347–364. 3.1[35]

M. Rathinam , Convergence of moments of tau leaping schemes for unbounded markov processes oninteger lattices , SIAM J. Numerical Analysis, 54 (2016), pp. 415–439. 1, 2.3, 3.1, 3.2, 3.1, 3.3[36]

M. Rathinam, L. R. Petzold, Y. Cao, and D. T. Gillespie , Stiﬀness in stochastic chemi-cally reacting systems: The implicit tau-leaping method , The Journal of Chemical Physics, 119 (2003),pp. 12784–12794. 1, 2.3, 5[37]

Y. Cao, L. R. Petzold, M. Rathinam, and D. T. Gillespie , The numerical stability of leapingmethods for stochastic simulation of chemically reacting systems , The Journal of Chemical Physics,121(24), 2004, pp. 12169–12178. 1, 2.3 3038]

M. Rathinam, L. R. Petzold, Y. Cao, and D. T. Gillespie , Consistency and stability of tau-leaping schemes for chemical reaction systems , Multiscale Model. Simul., 4 (2005), pp. 867–895 (elec-tronic). 1, 2.3, 3.3[39]

M. Rathinam and H. El Samad , Reversible-equivalent-monomolecular tau: A leaping method for“small number and stiﬀ” stochastic chemical systems , Journal of Computational Physics, 224(2), (2007),pp. 897–923. 1, 2.3[40]

M. Rathinam, P. W. Sheppard, and M. Khammash , Eﬃcient computation of parameter sensitivi-ties of discrete stochastic chemical reaction networks , Journal of Chemical Physics, 132 (2010). 1, 2.1,2.2, 1, 4, 6, 5[41]

P. W. Sheppard, M. Rathinam, and M. Khammash , A pathwise derivative approach to the com-putation of parameter sensitivities in discrete stochastic chemical systems , Journal of Chemical Physics,136 (2012). 1, 2.1[42]

J. Stelling, E. D. Gilles, and F. J. Doyle , Robustness properties of circadian clock architectures ,Proceedings of the National Academy of Sciences of the United States of America, 101 (2004), pp. 13210–13215. 1[43]

T. Tian and K. Burrage , Binomial leap methods for simulating stochastic chemical kinetics , TheJournal of Chemical Physics, 121 (2004), pp. 10356–10364. 1, 2.3, 3.1, 5[44]

J. M. G. Vilar, H. Y. Kueh, N. Barkai, and S. Leibler , Mechanisms of noise-resistance ingenetic oscillator , Proc. Natl. Acad. Sci., 99(9) (2002), pp. 5988–5992. 1[45]

T. Wang and M. Rathinam , Eﬃciency of the girsanov transformation approach for parametric sen-sitivity analysis of stochastic chemical kinetics , Available on arXiv:1412.1005, (2016). 1, 2.2, 4.1[46]

E. Weinan, D. Liu, and E. Vanden-Eijnden , Nested stochastic simulation algorithm for chemicalkinetic systems with disparate rates , Journal of Chemical Physics, 123 (2005), pp. 1–8.[47]

Y. Yang and M. Rathinam , Tau leaping of stiﬀ stochastic chemical systems via local central limitapproximation , Journal of Computational Physics, 242 (2013), pp. 581–606. 1, 2.3[48]

Y. Yang, M. Rathinam, and J. Shen , Integral tau methods for stiﬀ stochastic chemical systems , TheJournal of chemical physics, 134 (2011), p. 044129. 1, 2.3[49]

T. Li, A. Abdulle, and W. E , Eﬀectiveness of implicit methods for stiﬀ stochastic diﬀerential equa-tions , Communications in Computational Physics, Feb. 2008. 2.3[50]

I. Cipcigan and M. Rathinam , Uniform convergence of interlaced Euler method for stiﬀ stochasticdiﬀerential equations , Multiscale Modeling & Simulation, 2011, 9(3), pp.1217-1252. 2.3[51]

M. Morshed, B. Ingalls, and S. Ilie , An eﬃcient ﬁnite-diﬀerence strategy for sensitivity analysisof stochastic models of biochemical systems , Biosystems, 151 (2017), pp. 43–52. 1, 5, 5[52]