[PDF] Correlated pseudo-marginal schemes for time-discretised stochastic kinetic models

Abstract

The challenging problem of conducting fully Bayesian inference for the reaction rate constants governing stochastic kinetic models (SKMs) is considered. Given the challenges underlying this problem, the Markov jump process representation is routinely replaced by an approximation based on a suitable time discretisation of the system of interest. Improving the accuracy of these schemes amounts to using an ever finer discretisation level, which in the context of the inference problem, requires integrating over the uncertainty in the process at a predetermined number of intermediate times between observations. Pseudo-marginal Metropolis-Hastings schemes are increasingly used, since for a given discretisation level, the observed data likelihood can be unbiasedly estimated using a particle filter. When observations are particularly informative an auxiliary particle filter can be implemented, by employing an appropriate construct to push the state particles towards the observations in a sensible way. Recent work in state-space settings has shown how the pseudo-marginal approach can be made much more efficient by correlating the underlying pseudo-random numbers used to form the likelihood estimate at the current and proposed values of the unknown parameters. We extend this approach to the time-discretised SKM framework by correlating the innovations that drive the auxiliary particle filter. We find that the resulting approach offers substantial gains in efficiency over a standard implementation.

Full PDF

aa r X i v : . [ s t a t . C O ] J a n Correlated pseudo-marginal schemes for time-discretised stochastickinetic models

Andrew Golightly a, ∗ Emma Bradley b Tom Lowe a Colin S. Gillespie aa School of Mathematics, Statistics and Physics, Newcastle University, Newcastle uponTyne, UK b Hiscox Ltd., London, UK

Abstract

The challenging problem of conducting fully Bayesian inference for the reaction rate constantsgoverning stochastic kinetic models (SKMs) is considered. Given the challenges underlying thisproblem, the Markov jump process representation is routinely replaced by an approximationbased on a suitable time-discretisation of the system of interest. Improving the accuracy of theseschemes amounts to using an ever ﬁner discretisation level, which in the context of the inferenceproblem, requires integrating over the uncertainty in the process at a predetermined numberof intermediate times between observations. Pseudo-marginal Metropolis–Hastings schemes areincreasingly used, since for a given discretisation level, the observed data likelihood can beunbiasedly estimated using a particle ﬁlter. When observations are particularly informative, anauxiliary particle ﬁlter can be implemented, by propagating particles conditional on the nextobservation. Recent work in state-space settings has shown how the pseudo-marginal approachcan be made much more eﬃcient by correlating the underlying pseudo-random numbers used toform the estimate of likelihood at the current and proposed values of the unknown parameters.This approach is extended to the time discretised SKM framework by correlating the innovationsthat drive the auxiliary particle ﬁlter. The resulting approach is found to oﬀer substantial gainsin eﬃciency over a standard implementation.

Keywords: auxiliary particle ﬁlter (APF); Bayesian inference; Markov jump process (MJP); Pois-son leap; chemical Langevin equation; particle MCMC.

A stochastic kinetic model (SKM) typically refers to a reaction network, an associated rate law anda probabilistic description of the reaction dynamics. Reactions occur continuously in time with a re-action occurrence resulting in a discrete change to the system state. A Markov jump process (MJP)therefore provides a natural description of the time-course behaviour of the species involved in thereaction network. The resulting modelling framework is fairly ﬂexible and consequently, has beenused ubiquitously in areas such as epidemiology (Lin and Ludkovski, 2013; McKinley et al., 2014;O’Neill and Roberts, 1999), population ecology (Ferm et al., 2008; Boys et al., 2008; Gillespie and Golightly,2010) and systems biology (Wilkinson, 2009; Golightly and Wilkinson, 2015; Koblents and Miguez,2015; Hey et al., 2015). A concise introduction to SKMs can be found in Wilkinson (2012).Whilst exact simulation of the MJP is straightforward (using for example the direct methodof Gillespie (1976)), performing exact fully Bayesian inference is made problematic by the in-tractability of the observed data likelihood. Consequently, several approaches have been devel-oped that make use of computationally intensive methods. These include the use of data aug-mentation (Boys and Giles, 2007; Boys et al., 2008) together with Markov chain Monte Carlo ∗ email: [email protected] i and those generated at iteration i + 1. When using the CLE, this can be achievedby correlating the Gaussian innovations that drive the MDB. When using the Poisson leap, thenumbers of reaction events conditional on the next observation can be used. A similar approach isdescribed in Tran et al. (2016) for a univariate diﬀusion process. Choppala et al. (2016) consider aLotka–Volterra reaction network and calculate the observed data likelihood by averaging G ‘blocks’of unbiased estimates (which can be computed in parallel). Correlation is introduced by only up-dating the likelihood in a randomly chosen block. This is the so-called blockwise pseudo-marginalmethod. Our contribution is a uniﬁed framework for applying a correlated pseudo-marginal algo-rithm to a general class of time-discretised stochastic kinetic models, that additionally allows aﬂexible observation regime. In particular, we consider incomplete observation of the model compo-nents as well as Gaussian measurement error. We apply the resulting methodology to four examplesarising in systems biology and epidemiology, using both real and synthetic data.The remainder of this paper is organised as follows. Section 2 gives a brief introduction toSKMs with particular attention to the derivation of the Poisson leap and CLE approximations.The inference algorithm is described in detail in Section 3 and applied to several examples inSection 4. Conclusions are drawn in Section 5. 2 Stochastic kinetic models

Consider a reaction network involving s species X , . . . , X s and v reactions R , . . . , R v such that s X j =1 p ij X j −→ s X j =1 q ij X j , i = 1 , . . . , v where p ij and q ij are non-negative integers known as stoichiometric coeﬃcients. Let X j,t denotethe (discrete) number of species X j at time t , and let X t be the s -vector X t = ( X ,t , . . . , X s,t ) T .The time evolution of X t is most naturally described by a Markov jump process (MJP), so that foran inﬁnitesimal time increment dt and an instantaneous hazard h i ( X t , c i ), the probability of a type i reaction occurring in the time interval ( t, t + dt ] is h i ( X t , c i ) dt . Under the standard assumptionof mass action kinetics, h i is proportional to a product of binomial coeﬃcients. Speciﬁcally h i ( X t , c i ) = c i s Y j =1 (cid:18) X j,t p ij (cid:19) . Values for c = ( c , . . . , c v ) T , the initial system state X = x and the s × v stoichiometry matrix S whose ( i, j )th element is given by q ji − p ji , complete speciﬁcation of the Markov process. Despitethe intractability of the probability mass function governing the state of the system at any time t ,generating exact realisations of the MJP is straightforward via a technique known in this contextas Gillespie’s direct method (Gillespie, 1977). In brief, if the current time and state of the systemare t and X t respectively, then the time to the next event will be exponentially distributed withrate parameter h ( X t , c ) = v X i =1 h i ( X t , c i ) , and the event will be a reaction of type R i with probability h i ( X t , c i ) /h ( X t , c ) independently ofthe inter-event time. Whilst generating simulations of the MJP description of the SKM is straightforward, capturingevery occurrence of a reaction time and type can be computationally expensive, and this maypreclude use of the MJP as an inferential model. We therefore consider two approximations tothe MJP, the Poisson leap method and the chemical Langevin equation, and give a brief, informalderivation of both approaches.Consider an inﬁnitesimal time interval, ( t, t + dt ], over which the reaction hazards will remainconstant almost surely. The occurrence of reaction events can therefore be regarded as the occur-rence of events of a Poisson process with independent realisations for each reaction type. Hence, foran interval ( t, t + ∆ t ] of ﬁnite length, ∆ t , and the current system state X t , the number of reactionevents of type i , r i , is Poisson distributed with rate h i ( X t , c )∆ t . Let r = ( r , . . . , r v ) T . It shouldthen be clear that the system state can be updated approximately, according to X t +∆ t = X t + Sr . (1)Further extensions to this approach (although not pursued here) involve stepping ahead a variableamount of time τ , based on the rate constants and the current state of the system. This gives theso called τ -leap algorithm (Gillespie, 2001).It should now be clear from (1) that the expectation and variance of the inﬁnitesimal dX t areE( dX t ) = S h ( X t , c ) dt, Var( dX t ) = S diag { h ( X t , c ) } S T dt, h ( X t , c ) = ( h ( X t , c ) , . . . , h v ( X t , c v )) T . Hence, a further approximation can be obtained byconstructing the Itˆo stochastic diﬀerential equation (SDE) that has the same inﬁnitesimal meanand variance as the true MJP. That is dX t = S h ( X t , c ) dt + q S diag { h ( X t , c ) } S T dW t , (2)where W t is an s -vector of standard Brownian motion and p S diag { h ( X t , c ) } S T is an s × s matrix B such that BB T = S diag { h ( X t , c ) } S T . Equation (2) is the SDE most commonly referred to asthe chemical Langevin equation (CLE), and can be shown to approximate the SKM increasinglywell in high concentration scenarios (Gillespie, 2000). The CLE can rarely be solved analytically,and it is common to work with a discretisation such as the Euler–Maruyama discretisation whichgives X t +∆ t = X t + S h ( X t , c )∆ t + q S diag { h ( X t , c ) } S T ∆ t Z where Z is a standard multivariate Gaussian random variable. Suppose that the Markov jump process is not observed directly, but observations y t (on a regulargrid) are available and assumed conditionally independent (given the latent jump process) withconditional probability distribution obtained via the observation equation, Y t = P T X t + ε t , ε t ∼ N (0 , Σ) , t = 1 , . . . , n. (3)Here, Y t is a length- p vector, P is a constant matrix of dimension s × p and ε t is a length- p Gaussianrandom vector. The density linking the observed and latent process is denoted by p ( y t | x t ). Forsimplicity we assume that Σ is known.In what follows, we replace the expensive MJP with either the Poisson leap approximationor chemical Langevin equation, and perform exact (simulation-based) Bayesian inference using theapproximate model. We anticipate that the time between observations is too large for these approx-imations to be directly applied and therefore introduce intermediate times between observations.Hence, without loss of generality, consider an equally spaced partition of the time interval [ t − , t ]as t − τ t − , < τ t − , < . . . < τ t − ,m − < τ t − ,m = t (4)where τ t − ,i +1 − τ t − ,i = ∆ τ = 1 /m . Hence, the approximation is applied recursively over eachsub-interval [ τ t − ,i , τ t − ,i +1 ] rather than in a single instance over [ t − , t ]. Note that the value m is pre-speciﬁed by the practitioner and controls both the accuracy and computational cost of theapproximation.Suppose now that the main objective is inference for the rate constants c given data y =( y , . . . , y n ) T . To this end, consider the marginal posterior π ( c ) ∝ π ( c ) p ( y | c ) (5)where π ( c ) is the prior density ascribed to c . Unfortunately, (5) is complicated by the observeddata likelihood p ( y | c ). For the CLE, this satisﬁes p ( y | c ) = Z p ( x | c ) p ( y | x ) dx x = ( x τ , , . . . , x τ ,m , x τ , , x τ , . . . . . . , x τ n − ,m ). Additionally, p ( x | c ) = p ( x ) n − Y t =1 m − Y i =0 N (cid:0) x τ t,i +1 ; x τ t,i + S h ( x τ t,i , c )∆ τ , S diag { h ( x τ t,i , c ) } S T ∆ τ (cid:1) (6)and p ( y | x ) = n Y t =1 N (cid:0) y t ; P T x t , Σ (cid:1) (7)where N( · ; a, B ) denotes the pdf of a Gaussian random variable with mean a and variance B . Forthe Poisson leap approximation we have that p ( y | c ) = X x ,r p ( x ) p ( r | x , c ) p ( y | r, x )where r = ( r τ , , . . . , r τ ,m , r τ , , r τ , . . . . . . , r τ n − ,m ) and for example, r τ t,i = ( r τ t,i, , . . . , r τ t,i,v ) T isthe length- v vector containing the number of reactions of each type in the interval [ τ t,i − , τ t,i ].It should be clear that given x and r , x can be obtained deterministically through recursiveapplication of (1). Hence p ( y | r, x ) coincides with p ( y | x ) above and p ( r | x , c ) = n − Y t =1 m − Y i =0 v Y j =1 Po (cid:0) r τ t,i +1 ,j ; h j ( x τ t,i , c j )∆ τ (cid:1) where Po( · ; h ) denotes the mass function of a Poisson random variable with mean h .Owing to the intractability of p ( y | c ), the posterior in (5) is sampled via Markov chain MonteCarlo (MCMC). In particular, a suitably constructed pseudo-marginal Metropolis–Hastings scheme(PMMH) (Beaumont, 2003; Andrieu and Roberts, 2009; Andrieu et al., 2010) provides an eﬀectiveway of performing this task. We brieﬂy describe this approach in the next section alongside anadaptation of a recently proposed modiﬁcation (the so-called correlated PMMH method) that givesa signiﬁcant improvement in eﬃciency over the basic scheme. Suppose that auxiliary variables U ∼ g ( u ) can be used to generate a non-negative unbiased es-timator ˆ p U ( y | c ) of p ( y | c ). Therefore, an unbiased (up to a constant) estimator of the posterioris ˆ π U ( c ) = π ( c )ˆ p U ( y | c ) . The PMMH scheme is an MH scheme targeting˜ π ( c, u ) = π ( c ) g ( u )ˆ p u ( y | c )which has marginal distribution Z π ( c ) g ( u )ˆ p u ( y | c ) du ∝ π ( c ) . For a proposal kernel of the form q ( c ′ | c ) g ( u ′ ), the MH acceptance probability is α (cid:8) ( c ′ , u ′ ) | ( c, u ) (cid:9) = min (cid:26) , ˜ π ( c ′ , u ′ )˜ π ( c, u ) × q ( c | c ′ ) g ( u ) q ( c ′ | c ) g ( u ′ ) (cid:27) = min (cid:26) , π ( c ′ )ˆ p u ′ ( y | c ′ ) π ( c )ˆ p u ( y | c ) × q ( c | c ′ ) q ( c ′ | c ) (cid:27) (8)5nd therefore the density associated with the auxiliary variables need not be evaluated.Note that the proposal kernel need not be restricted to the use of g ( u ′ ). The correlated PMMHscheme (Deligiannidis et al., 2017; Dahlin et al., 2015) generalises the PMMH scheme by using aproposal kernel of the form q ( c ′ | c ) K ( u ′ | u ) where K ( ·|· ) satisﬁes the detailed balance equation g ( u ) K ( u ′ | u ) = g ( u ′ ) K ( u | u ′ ) . (9)It is straightforward to show that a MH scheme with proposal kernel q ( c ′ | c ) K ( u ′ | u ) and acceptanceprobability (8) satisﬁes detailed balance with respect to the target ˜ π ( c, u ). Upon negating thetrivial scenario that the chain does not move, we have that˜ π ( c, u ) q ( c ′ | c ) K ( u ′ | u ) α (cid:8) ( c ′ , u ′ ) | ( c, u ) (cid:9) = min (cid:8) π ( c ) g ( u )ˆ p u ( y | c ) q ( c ′ | c ) K ( u ′ | u ) , π ( c ′ ) g ( u )ˆ p u ′ ( y | c ′ ) q ( c | c ′ ) K ( u ′ | u ) (cid:9) = min (cid:8) π ( c ) g ( u )ˆ p u ( y | c ) q ( c ′ | c ) K ( u ′ | u ) , π ( c ′ ) g ( u ′ )ˆ p u ′ ( y | c ′ ) q ( c | c ′ ) K ( u | u ′ ) (cid:9) = ˜ π ( c ′ , u ′ ) q ( c | c ′ ) K ( u | u ′ ) α (cid:8) ( c, u ) | ( c ′ , u ′ ) (cid:9) where (9) is used to deduce the third line.In practice, g ( u ) is a standard Gaussian density and K ( u ′ | u ) is taken to be the kernel associatedwith a Crank–Nicolson proposal. That is g ( u ) = N ( u ; 0 , I d ) and K ( u ′ | u ) = N (cid:0) u ′ ; ρu , (cid:0) − ρ (cid:1) I d (cid:1) where I d is the identity matrix whose dimension d is determined by the number of elements in u and ρ is chosen to be close to 1, to induce positive correlation between ˆ p U ( y | c ) and ˆ p U ′ ( y | c ′ ).Taking ρ = 0 gives the special case that K ( u ′ | u ) = g ( u ′ ), which corresponds to the PMMH scheme.The motivation for taking ρ ≈ U is not normally distributed (as is the case for the Poissonleap approximation) it is straightforward to generate uniform random variates via Φ( U ) (where Φ( · )is the cdf of a standard normal random variable). These uniform draws can then be transformede.g. to give Poisson draws, via the inversion method.The correlated PMMH scheme is summarised in Algorithm 1. After initialisation, each iterationrequires computation of ˆ p u ′ ( y | c ′ ). This is achieved by executing an auxiliary particle ﬁlter (for eachproposed value ( c ′ , u ′ )), which we describe in the next section. The observed data likelihood p ( y | c ) can be factorised as p ( y | c ) = p ( y | c ) n Y t =2 p ( y t | y t − , c ) (10)where y t − = ( y , . . . , y t − ). Although the constituent terms in (10) will typically be intractable,a particle ﬁlter provides an eﬃcient mechanism for their estimation. Moreover, the particle ﬁlterthat we consider here gives an unbiased estimator of p ( y | c ) (Del Moral, 2004; Pitt et al., 2012) andhence can be used in steps 1(b) and 2(b) of the CPMMH scheme; see Algorithm 1. For a conciseintroduction to particle ﬁlters, we refer the reader to K¨unsch (2013) and the references therein.The basic idea behind the particle ﬁlter is to recursively approximate the sequence of ﬁl-tering densities p ( x t | y t , c ) using a sequence of importance sampling and resampling steps. Let u = ( u , . . . , u n ) denote a realisation of the random variables required by the particle ﬁlter. Wefurther adopt the partition u t = (˜ u t , ¯ u t ) T to distinguish between the variables used to propagatestate particles and those used in the resampling step, respectively. Note that ˜ u t = (˜ u t , . . . , ˜ u Nt )6 lgorithm 1 Correlated PMMH scheme (CPMMH)

Input: correlation parameter ρ and the number of CPMMH iterations n iters . Output: c (1) , . . . , c ( n iters ) .1. For iteration i = 0:(a) Set c (0) in the support of π ( c ) and draw u (0) ∼ N(0 , I d ).(b) Compute ˆ p u (0) ( y | c (0) ) by running Algorithm 2 with ( c, u ) = ( c (0) , u (0) ).2. For iteration i = 1 , . . . , n iters :(a) Draw c ′ ∼ q ( ·| c ( i − ) and ω ∼ N(0 , I d ). Put u ′ = ρu ( i − + p − ρ ω .(b) Compute ˆ p u ′ ( y | c ′ ) by running Algorithm 2 with ( c, u ) = ( c ′ , u ′ ).(c) With probability α (cid:8) ( c ′ , u ′ ) | ( c ( i − , u ( i − ) (cid:9) given by (8), put ( c ( i ) , u ( i ) ) = ( c ′ , u ′ ) other-wise store the current values ( c ( i ) , u ( i ) ) = ( c ( i − , u ( i − ).corresponding to a ﬁlter with N particles and ˜ u it = (˜ u it, , . . . , ˜ u it,m ) for t >

1, corresponding to thediscretisation introduced in Section 3.1.Given a weighted sample of ‘particles’ { x it − , w ( u it − ) } Ni =1 approximately distributed accordingto p ( x t − | y t − , c ), the particle ﬁlter uses the approximationˆ p ( x ( t − ,t ] | y t , c ) ∝ p ( y t | x t , c ) N X i =1 p ( x ( t − ,t ] | x it − , c ) w ( u it − ) (11)where, in the case of the CLE, x ( t − ,t ] = ( x τ t − , , . . . , x τ t − ,m ). We focus here on the CLE forreasons of brevity but note that in the case of the Poisson leap approximation, x ( t − ,t ] is replacedby r ( t − ,t ] = ( r τ t − , , . . . , r τ t − ,m ) since x t can be obtained deterministically using the state x t − andthe number of reactions of each type in the interval ( t − , t ].The form of (11) suggests a simple importance sampling/resampling strategy where parti-cles are resampled (with replacement) in proportion to their weights, propagated via x i ( t − ,t ] = f t (˜ u it ) ∼ p ( ·| x it − , c ) and reweighted by p ( y t | x it , c ). Here, f t ( · ) is a deterministic function of ˜ u it (and the parameters c ) that gives an explicit connection between the particles and auxiliary vari-ables. Repeating this procedure at each time point gives the bootstrap particle ﬁlter (BPF) ofGordon et al. (1993). As discussed in Del Moral and Murray (2015) and Golightly and Wilkinson(2015) however, this scheme is likely to perform poorly when observations are informative. Inthis case very few particles will have reasonable weight, leading to an estimator of observed datalikelihood with high variance. This problem can be alleviated through the use of an auxiliaryparticle ﬁlter (APF) (Pitt and Shephard, 1999; Pitt et al., 2012) which propagates particles via x i ( t − ,t ] = f t (˜ u it ) ∼ g ( ·| x it − , y t , c ), with the special case of g ( ·| x it − , y t , c ) = p ( ·| x it − , c ) giving theBPF.The APF is described generically in Algorithm 2. The output of the APF can be used to estimatethe constituent terms in (10) by simply taking the average unnormalised weight; see steps 1(c) and2(e). A discussion of the sorting and resampling steps 2(a) and 2(b) is provided in Section 3.3.1.Suitable propagation densities g ( x ( t − ,t ] | x t − , y t , c ) and g ( r ( t − ,t ] | x t − , y t , c ) (as necessary for thePoisson leap) are given in Section 3.3.2. For the resampling step we follow Deligiannidis et al. (2017) and use systematic resampling, whichonly requires simulating a single uniform random variable at each time point. These can be con-7 lgorithm 2

Auxiliary particle ﬁlter

Input: parameters c , auxiliary variables u and the number of particles N . Output: estimate ˆ p u ( y | c ) of the observed data likelihood.1. Initialisation ( t = 1).(a) Sample the prior. Draw ˜ u i ∼ N(0 ,

1) and put x i = f (˜ u i ) ∼ p ( · ), i = 1 , . . . , N .(b) Compute the weights. For i = 1 , . . . , N set˜ w ( u i ) = p ( y | x i , c ) , w ( u i ) = ˜ w ( u i ) P Nj =1 ˜ w ( u j ) . (c) Update observed data likelihood estimate. Compute ˆ p u ( y | c ) = P Ni =1 ˜ w ( u i ) /N .2. For times t = 2 , , . . . , n :(a) Sort.

Obtain the sorted index s ( i ) and put (cid:8) x it − , w ( u it − ) (cid:9) := n x s ( i ) t − , w ( u s ( i ) t − ) o , i =1 , . . . , N .(b) Resample.

Obtain ancestor indices a it − , i = 1 , . . . , N using systematic resampling onthe collection of weights { w ( u t − ) , . . . , w ( u Nt − ) } .(c) Propagate.

Draw ˜ u it ∼ N(0 m , I m ) and put x i ( t − ,t ] = f t (˜ u it ) ∼ g (cid:0) · | x a it − t − , y t , c (cid:1) , i =1 , . . . , N .(d) Compute the weights. For i = 1 , . . . , N set˜ w ( u it ) = p ( y t | x it , c ) p (cid:0) x i ( t − ,t ] | x a it − t − , c (cid:1) g (cid:0) x i ( t − ,t ] | x a it − t − , y t , c (cid:1) , w ( u it ) = ˜ w ( u it ) P Nj =1 ˜ w ( u jt ) . (e) Update observed data likelihood estimate. Computeˆ p u t ( y t | c ) = ˆ p u t − ( y t − | c )ˆ p u t ( y t | y t − , c )where ˆ p u t ( y t | y t − , c ) = N P Ni =1 ˜ w ( u it ).structed from ¯ u t ∼ N(0 ,

1) via Φ(¯ u t ). Sorted uniforms can then be found via ¯ u iRt = ( i − /N +Φ(¯ u t ) /N, i = 1 , . . . , N which are in turn used to choose indices a it − that (marginally) satisfyPr( a it − = k ) = w ( u kt − ). Note that upon changing c and u the eﬀect of the resampling step islikely to prune out diﬀerent particles, thus breaking the correlation between successive estimatesof observed data likelihood. To alleviate this problem, Deligiannidis et al. (2017) sort the par-ticles before resampling via the Hilbert sort procedure of Gerber and Chopin (2015). We followChoppala et al. (2016) by using a simple Euclidean sorting procedure. At observation time t (im-mediately after propagation), we sort the particle trajectories x i ( t − ,t ] as follows. The ﬁrst sortedparticle corresponds to that with the smallest value of the ﬁrst component of the set { x t , . . . , x Nt } .The remaining particles are chosen by minimising the Euclidean distance between the currentlyselected particle and the set of all other particles.8 .3.2 Propagation We require a suitable importance proposal g ( x ( t − ,t ] | x t − , y t , c ) (or g ( r ( t − ,t ] | x t − , y t , c ) in the caseof the Poisson leap method) that takes into account the information in y t . Consider a time interval[ t − , t ] and recall the partition in (4) which we will write as t − τ < τ < . . . < τ m − < τ m = t for notational simplicity. We adopt the following factorisations, g ( x ( t − ,t ] | x t − , y t , c ) = m − Y k =0 g ( x τ k +1 | x τ k , y t , c ) , g ( r ( t − ,t ] | x t − , y t , c ) = m − Y k =0 g ( r τ k +1 | x τ k , y t , c )and seek suitable expressions for the constituent terms in each product. In the case of theCLE, we use the modiﬁed diﬀusion bridge construct of Durham and Gallant (2002) (see alsoWhitaker et al. (2017b) for a recent discussion) which eﬀectively uses a linear Gaussian approxi-mation of X τ k +1 | x τ k , y t , c . We obtain g ( x τ k +1 | x τ k , y t , c ) = N (cid:0) x τ k +1 ; x τ k + µ ( x τ k , c )∆ τ , Ψ( x τ k , c )∆ τ (cid:1) (12)where µ ( x τ k , c ) = α k + β k P (cid:0) P T β k P ∆ k + Σ (cid:1) − (cid:8) y t − P T ( x τ k + α k ∆ k ) (cid:9) and Ψ( x τ k , c ) = β k − β k P (cid:0) P T β k P ∆ k + Σ (cid:1) − P T β k ∆ τ. Here α k = S h ( x τ k , c ), β k = S diag { h ( x τ k , c ) } S T and ∆ k = t − τ k . Given that the importancedensity in (12) is Gaussian, it is straightforward to perform the propagation step in Algorithm 2.We draw ˜ u it,k +1 ∼ N(0 , I s ) and set x τ k +1 = x τ k + µ ( x τ k , c )∆ τ + q Ψ( x τ k , c )∆ τ ˜ u it,k +1 , k = 0 , . . . , m − . For the Poisson leap approximation, we take g ( r τ k +1 | x τ k , y t , c ) to be a Poisson probability with rategiven by an approximation to the expected number of reaction events in [ τ k , τ k +1 ] given the currentstate of the system and, crucially, the observation y t . The derivation of this approximate rate canbe found in Golightly and Wilkinson (2015) and is given by h ∗ ( x τ k , c | y t ) = h ( x τ k , c ) + diag { h ( x τ k , c ) } S T P (cid:0) P T β k P ∆ k + Σ (cid:1) − (cid:8) y t − P T ( x τ k + α k ∆ k ) (cid:9) . Hence, we obtain g ( r τ k +1 | x τ k , y t , c ) = v Y j =1 Po (cid:0) r τ k +1 ,j ; h ∗ j ( x τ k , c | y t )∆ τ (cid:1) . (13)The propagation step in Algorithm 2 can be performed by ﬁrst drawing ˜ u it,k +1 ∼ N(0 , I v ) and thenapplying the inverse Poisson CDF to each component of Φ(˜ u it,k +1 ) to give r τ k +1 , for k = 0 , . . . , m − x τ k +1 = x τ k + Sr τ k +1 , k = 0 , . . . , m − . A single iteration of the CPMMH scheme described in Algorithm 1 requires n − × m × N drawsof the bridge construct with density (12) when using the CLE, and mass function (13) when usingthe Poisson leap. Recall that n is the number of observations, m is the number of latent processvalues per observation interval and N is the number of particles in the auxiliary particle ﬁlter. The9ost of drawing from (12) and (13) will be dictated by the number of observed components p , sincethe inversion of a p × p matrix is required. Nevertheless, for many systems of interest, it is unlikelythat all components will be observed (Golightly and Wilkinson, 2015), and we therefore anticipatethat for systems with many species, p << s where s is the number of species. It remains for thepractitioner to choose m and N to balance posterior accuracy and computational cost.We follow Stramer and Bognar (2011) and Golightly and Wilkinson (2011) among others, byperforming short pilot runs of the inference scheme (for a ﬁxed, conservative value of N ) withincreasing values of m , until no discernible diﬀerence in the posterior output is detected (e.g.by visual inspection of kernel density estimates of the marginal parameter posteriors). For theexamples in Section 4, we ﬁnd that m ≤

10 is suﬃcient.The number of particles N controls the variance of the estimator of observed data likelihoodˆ p U ( y | c ). As the variances increases, the acceptance probability of the pseudo-marginal MH schemerapidly decreases to 0 (Pitt et al., 2012), resulting in ‘sticky’ behaviour of the parameter chains.Practical advice for choosing N to balance mixing performance and computational cost can foundin Doucet et al. (2015) and Sherlock et al. (2015). The variance of the log-posterior (denoted σ N , computed with N particles) at a central value of c (e.g. the estimated posterior median)should be around 2. For the CPMMH scheme, Tran et al. (2016) suggests choosing N so that σ N = 2 . / − ρ l where ρ l is the estimated correlation between ˆ p u ( y | c ) and ˆ p u ′ ( y | c ′ ). Note thatfor ρ l = 0 corresponding to the vanilla PMMH case, the aforementioned tuning advice is broadlyconsistent with Doucet et al. (2015) and Sherlock et al. (2015). To illustrate the proposed approach we consider four applications of increasing complexity. Asimple immigration–death model is considered in Section 4.1. We ﬁt the CLE to synthetic dataand compare CPMMH with PMMH and additionally, the state-of-the-art MCMC scheme, that is,the modiﬁed innovation scheme (MIS) of Golightly and Wilkinson (2008), described brieﬂy in theappendix. In Section 4.2, we ﬁt the CLE associated with a Lotka–Volterra model to synthetic data.We also investigate the eﬀect of increasing observation noise on the performance of the CPMMHscheme. The autoregulatory network of Sherlock et al. (2014) is considered in Section 4.3. Wegenerate synthetic data that is inherently discrete, precluding the use of the CLE as an inferentialmodel. We therefore perform inference using the Poisson leap, and additionally explore the eﬀectof using a bootstrap particle ﬁlter on the performance of the CPMMH scheme. Finally, the CLEapproximation of a Susceptible–Infected–Removed (SIR) epidemic model is ﬁtted using data on aninﬂuenza outbreak in a boys’ boarding school in Great Britain (BMJ News and Notes, 1978). Itis assumed that the infection rate is a mean reverting diﬀusion process giving a model with twounobserved components.Since the rate constants must be strictly positive we update log c using a random walk proposalwith Gaussian innovations. We took the innovation variance to be the posterior variance of log c (estimated from a pilot run) scaled by a factor of 2 . /v for CPMMH and 2 . /v for MIS, where v is the number of rate constants. We chose the number of particles N by following the practicaladvice described in Section 3.4. To ensure reasonable mixing of the auxiliary variables U , weadopted the conservative choice of ρ = 0 .

99. In each example we use eﬀective sample size (ESS) asa comparator. That is

ESS = n iters P ∞ k =1 α k where α k is the autocorrelation function for the series at lag k and n iters is the number of iterationsin the main monitoring run. The ESS can be computed using the R package CODA (Plummer et al.,2006). We report the minimum eﬀective sample size over all components, denoted by ESS min . Allalgorithms are coded in R and were run on a desktop computer with an Intel Core i7-4770 processor10t 3.40GHz. An R implementation of PMMH, CPMMH and MIS for generic (univariate) diﬀusionprocesses is available at https://github.com/csgillespie/cor-pseudo-marginal The immigration–death reaction network takes the form R : ∅ c −−−→ X R : X c −−−→ ∅ with immigration and death reactions shown respectively. The stoichiometry matrix is S = (cid:0) − (cid:1) and the associated hazard function is h ( X t , c ) = ( c , c X t ) T where X t denotes the state of the system at time t . Applying (2) directly gives the CLE as dX t = ( c − c X t ) dt + p ( c + c X t ) dW t . We generated a synthetic data set consisting of 101 observations by simulating from the Markovjump process via Gillespie’s direct method and retaining the system state at integer times. Toprovide a challenging scenario for the CLE, we took c = 4 and c = 0 . X = 500 so that typicaltrajectories exhibit nonlinear dynamics over the time interval [0 , X t so that the latent path between observation times, which is propagated according toequation (12), becomes g ( x τ k +1 | x τ k , x t , c ) = N (cid:18) x τ k +1 ; x τ k + x t − x τ k t − τ k ∆ τ , t − τ k +1 t − τ k β ( x τ k , c )∆ τ (cid:19) , which can be sampled for k = 0 , , . . . , m −

2. We also note in the case of error-free observation ofall components of X t (as is considered in this application), the auxiliary particle ﬁlter can be seenas a simple importance sampler. Consequently, the sorting and resampling steps of Algorithm 2are not required here.We took independent N (0 , ) priors for log c and log c , and determined an appropriatediscretisation level by performing short runs of MIS with ∆ τ ∈ { . , . , . , . } . Since there wasvery little diﬀerence in posteriors beyond ∆ τ = 0 .

2, we used this value in the main monitoring runswhich consisted of 2 × iterations of MIS, CPMMH and PMMH. The results are summarised byFigures 1–2 and Table 1.Table 1 shows a comparison of each competing inference scheme. Practical advice (as describedabove) suggests that CPMMH can tolerate much smaller values of N , with the scheme only requiringa value of N around 2 (and we report results for N = 1 ,

2) when ρ = 0 .

99 compared to N = 50 forPMMH. Moreover, we found that the PMMH scheme often exhibited ‘sticky’ behaviour, resulting inrelatively low eﬀective sample sizes. Consequently, in terms of minimum ESS per second, CPMMH(with ρ = 0 . N = 1) outperforms PMMH by a factor of 210, reducing to 150 when N = 2.As noted by Deligiannidis et al. (2017), values of ρ close to 1 can result in slow mixing of theauxiliary variables U , in turn giving parameter correlograms that exhibit long range dependence.This does not appear to be the case for ρ = 0 .

99 (see middle panel of Figure 2). Nevertheless,we note that reducing ρ to 0.9 still gives an increase in overall eﬃciency of almost two orders ofmagnitude over PMMH. When comparing CPMMH to the modiﬁed innovation scheme we obtainsimilar ESS values. However, the relatively low computational cost of CPMMH (with ρ = 0 . N = 1) results in an improvement in overall eﬃciency (with an mESS/s of 42 vs 18).11 .0 1.2 1.4 1.6 −0.5 −0.4 −0.3 −0.2 −0.1 − . − . − . − . − . P S f r ag r e p l a ce m e n t s D e n s i t y D e n s i t y log c log c l og c log c Figure 1: Immigration–death model. Left and middle panels: marginal posterior distributionsbased on the CLE (solid lines) and MJP (dashed lines). Right panel: Contour plot of the jointposterior based on the CLE (solid line) and MJP (dashed line). The true values of log c and log c are indicated. . . . . . . . . . . . . . . . . . . P S f r ag r e p l a ce m e n t s LagLagLag

Figure 2: Immigration–death model. Correlogram based on log c samples from the output of MIS(left panel), CPMMH with ρ = 0 .

99 (middle panel) and PMMH (right panel).The eﬀect of the CLE as an inferential model can be seen in Figure 1. Marginal posteriors basedon the CLE exhibit small discrepancies when compared to those obtained under the MJP (obtainedusing the PMMH method described in Golightly and Wilkinson (2015)). This is unsurprising giventhe discrete nature of the synthetic data. Nevertheless, posterior samples under the CLE areconsistent with the true values that produced the data. Moreover, the inference scheme for theMJP gave a minimum ESS per second of 0.0062. Hence, for this example, sacriﬁcing a small amountof posterior accuracy by using the CLE as an inferential model gives an increase in overall eﬃciencyof a factor of over 3 orders of magnitude. Given the additional computational complexity of theremaining applications, in what follows we focus on either the CLE or Poisson leap as the inferentialmodel.

The Lotka–Volterra system comprises two biochemical species (prey and predator) and three re-action channels (prey reproduction, prey death and predator reproduction, predator death). Thereaction list is R : X c −−−→ X , R : X + X c −−−→ X and R : X c −−−→ ∅ . lgorithm ρ N CPU (s) mESS mESS/s Rel.MIS – – 121 2190 18 90CPMMH 0.99 1 45 1910 42 2100.99 2 78 2370 30 1500.90 1 45 820 18 90PMMH 0 50 1740 380 0.2 1

Table 1: Immigration–death model. Correlation parameter ρ , number of particles N , CPU time(in seconds s ), minimum ESS, minimum ESS per second and relative (to PMMH) minimum ESSper second. All results are based on 2 × iterations of each scheme.Let X t = ( X ,t , X ,t ) T denote the system state at time t . The stoichiometry matrix associated withthe system is S = (cid:18) − − (cid:19) and the associated hazard function is h ( X t , c ) = ( c X ,t , c X ,t X ,t , c X ,t ) T . The CLE for this model is given by d (cid:18) X ,t X ,t (cid:19) = (cid:18) c X ,t − c X ,t X ,t c X ,t X ,t − c X ,t (cid:19) dt + (cid:18) c X ,t + c X ,t X ,t − c X ,t X ,t − c X ,t X ,t c X ,t X ,t + c X ,t (cid:19) d (cid:18) W ,t W ,t (cid:19) , where W ,t and W ,t are independent standard Brownian motion processes.We generated a single realisation of the jump process at 51 integer times via Gillespie’s directmethod with rate constants as in Boys et al. (2008), that is, c = (0 . , . , . T and an initialcondition of X = (100 , T . We then obtained three data sets by corrupting the system stateaccording to Y t ∼ N (cid:0) X t , σ I (cid:1) where I is the 2 × σ ∈ { , , } giving data sets designated as D , D and D respectively. We took independent N (0 , ) priors for each log c i , i = 1 , ,

3, and followedGolightly and Wilkinson (2011) by setting ∆ τ = 0 .

2. The main monitoring runs consisted of 10 iterations of MIS, CPMMH (with ρ = 0 .

99) and PMMH. The results are summarised in Figure 3and Table 2.Figure 3 shows that posterior samples are consistent with the true values that produced thedata, despite using an approximate inferential model (the CLE). Table 2 shows a comparisonof each competing inference scheme. When using data set D ( σ = 1), CPMMH outperformsPMMH by an order of magnitude (in terms of overall eﬃciency) and compares favourably withMIS. However, it is clear that as the measurement error standard deviation ( σ ) increases, PMMHand CPMMH require more particles, in order to eﬀectively integrate over increasing uncertaintyin the observation process. Consequently, MIS outperforms PMMH and CPMMH when using D ( σ = 5) and D ( σ = 10), although the relative diﬀerence is less than an order of magnitude forMIS vs CPMMH. It is worth noting that the rate of increase in N is greater for CPMMH than forPMMH. Increasing σ appears to break down the correlation between successive estimates of thelog-posterior. Fixing the parameter values at the posterior mean and estimating the correlation,denoted by ρ l , between ˆ p u ( y | c ) and ˆ p u ′ ( y | c ) gave ρ l = 0 .

97 for D , ρ l = 0 .

91 for D and ρ l = 0 . D . Nevertheless, we still observe a worthwhile increase in overall eﬃciency of a factor of 2 forCPMMH vs PMMH, when using data set D corresponding to the relatively extreme σ = 10.13 −6.15 −6.10 −6.05 −6.00 −5.95 −5.90 −1.40 −1.30 −1.20 −1.10 P S f r ag r e p l a ce m e n t s D e n s i t y D e n s i t y D e n s i t y log c log c log c Figure 3: Lotka–Volterra model. Marginal posterior distributions based on the output of CPMMH( ρ = 0 .

99) using data sets D (solid lines), D (dashed lines) and D (dotted lines). The true valuesof log c , log c and log c are indicated. Data set Algorithm N CPU(s) mESS mESS/s Rel. D ( σ = 1) MIS – 14700 9218 0.627 13.5CPMMH 3 11280 8023 0.711 16.3PMMH 16 59730 2771 0.046 1.0 D ( σ = 5) MIS – 14600 8139 0.558 14.3CPMMH 8 29780 3681 0.124 3.2PMMH 20 75930 2959 0.039 1.0 D ( σ = 10) MIS – 14690 6436 0.438 15.3CPMMH 19 71520 3516 0.049 1.7PMMH 28 105770 3031 0.029 1.0 Table 2: Lotka–Volterra model. Number of particles N , CPU time (in seconds s ), minimum ESS,minimum ESS per second and relative (to PMMH) minimum ESS per second. All results are basedon 10 iterations of each scheme. In this section, we consider a simple autoregulatory network with two species, X and X whosetime-course behaviour evolves according to the set of coupled reactions R : ∅ c −−−→ X , R : ∅ c −−−→ X , R : X c −−−→ ∅ , R : X c −−−→ ∅ , R : X + X c −−−→ X . Essentially, reactions R and R represent immigration and reactions R and R represent death.The species interact via R . Let X t = ( X ,t , X ,t ) T denote the system state at time t . Thestoichiometry matrix associated with the system is S = (cid:18) − −

10 1 0 − (cid:19)

20 40 60 80 100 P S f r ag r e p l a ce m e n t s tt X , t X , t Figure 4: Autoregulatory network. A single realisation of the jump process with c =(10 , . , . , . , . T and X = (5 , T . Observations are indicated by circles. Algorithm N CPU(s) mESS mESS/s Rel.CPMMH (APF) 20 15580 1272 0.082 6.2PMMH (APF) 55 42010 1302 0.031 2.4PMMH (BPF) 200 95800 1263 0.013 1

Table 3: Autoregulatory network. Number of particles N , CPU time (in seconds s ), minimumESS, minimum ESS per second and relative (to bootstrap ﬁlter driven PMMH) minimum ESS persecond. All results are based on 10 iterations of each scheme.and the associated hazard function is h ( X t , c ) = ( c , c , c X ,t , c X ,t , c X ,t X ,t ) T . We simulated a single realisation of the jump process at 101 integer times via Gillespie’s directmethod with rate constants c = (10 , . , . , . , . T and an initial condition of X = (5 , T .We then discarded the values of X ,t to leave observations of X ,t only. The full data trace used togenerate the data set is given in Figure 4. The inherently discrete nature of the data set coupledwith long time periods where X ,t = 0 make applying the CLE impractical. We therefore use thePoisson leap approximation as the inferential model. To provide a challenging scenario, we assumeerror-free observation of X ,t so that step 2(d) of Algorithm 2 assigns a weight of 0 to the particle x it unless x i ,t coincides with the observation at time t . We took a weakly informative Gamma(10 , c and Gamma(0 . , .

1) priors for the remaining rate constants. We found little diﬀerencein sampled posterior values for a value of ∆ τ beyond 0 . iterations of CPMMH (with ρ = 0 . N by following the practical advice of Tran et al. (2016)for CPMMH and Sherlock et al. (2015) for PMMH. Inspection of Table 3 reveals that the bootstrapparticle ﬁlter (BPF) driven PMMH scheme required N = 200 particles. This reduces to N = 55when using the auxiliary particle ﬁlter (APF), and reduces further still to N = 20 when strongand positive correlation is introduced between successive values of the random variables that drivethe APF. Despite the APF driven scheme requiring many fewer particles than the BPF, overalleﬃciency (as measured by minimum ESS per second) is only increased by a factor of 2.4 due to15 .5 1.0 1.5 2.0 2.5 3.0 3.5 . . . . . . . −4 −3 −2 −1 . . . . . −8 −6 −4 −2 0 2 4 . . . . . P S f r ag r e p l a ce m e n t s D e n s i t y D e n s i t y D e n s i t y log c log c log c l og c l og c −1.0 −0.5 0.0 0.5 . . . . −8 −6 −4 −2 0 . . . . . P S f r ag r e p l a ce m e n t s D e n s i t y D e n s i t y l og c l og c l og c log c log c Figure 5: Autoregulatory network. Marginal posterior distributions based on the output of CP-MMH ( ρ = 0 . c i , i = 1 , . . . ,

5, are indicated.the computational complexity of the conditioned hazard, which is used to propagate state particleswithin the APF. The correlated implementation gives a further increase of a factor of 2.6, giving a6-fold increase in overall eﬃciency over the most basic PMMH scheme.

The Susceptible–Infected–Removed (SIR) epidemic model (see Andersson and Britton, 2000) de-scribes the evolution of two species (susceptibles X and infectives X ) via two reaction channelswhich correspond to an infection of a susceptible individual and a removal of an infective individual.The reaction equations are R : X + X c −−−→ X R : X c −−−→ ∅ . The stoichiometry matrix is S = (cid:18) − − (cid:19) and the associated hazard function is h ( X t , c ) = ( c X ,t X ,t , c X ,t ) T . We consider a data set consisting of the daily number of pupils conﬁned to bed (out of a totalof 763) during an inﬂuenza outbreak in a boys’ boarding school in Great Britain, instigated by a16 ay 1 2 3 4 5 6 7 8 9 10No. of infectives 1 3 6 25 73 221 294 257 236 189Day 11 12 13 14 15No. of infectives 125 67 26 10 3

Table 4: Boarding school data.

Algorithm N CPU (m) mESS mESS/m Rel.CPMMH 90 2765 226 0.08 7.2PMMH 600 26338 299 0.01 1

Table 5: Epidemic model. Number of particles N , CPU time (in minutes m ), minimum ESS,minimum ESS per minute and relative minimum ESS per minute. All results are based on 2 × iterations of each scheme.single pupil. Hence, X = (762 , T . The data are displayed graphically in BMJ News and Notes(1978) and converted into counts in Fuchs (2013). For completeness, we give the data in Table 4.We work with the CLE which has the form d (cid:18) X ,t X ,t (cid:19) = (cid:18) − c X ,t X ,t c X ,t X ,t − c X ,t (cid:19) dt + (cid:18) c X ,t X ,t − c X ,t X ,t − c X ,t X ,t c X ,t X ,t + c X ,t (cid:19) / d (cid:18) W ,t W ,t (cid:19) . (14)We further assume that the infection rate is a mean reverting diﬀusion process governed by theSDE d log c ,t = c ( c − log c ,t ) dt + c dW ,t . (15)Hence, the inferential model is speciﬁed by (14) and (15), where c is replaced by c ,t in (14). Wewish to infer c = ( c , c , c , c ) T based on measurements of X ,t only, giving a partially observedsystem. We took a normal N(0 , ) prior on the reversion level c of log c ,t , and exponential Exp(1)priors for the remaining parameters. For simplicity, we ﬁxed the initial unobserved infection rate bytaking log c , = −

6. The discretisation level was ﬁxed by taking ∆ τ = 0 .

1. The main monitoringruns consisted of 2 × iterations of CPMMH and PMMH. The results are summarised in Figure 6and Table 5. It is evident that CPMMH outperforms PMMH in terms of overall eﬃciency (asmeasured here by minimum ESS per minute) by a factor of 7. Exact (simulation-based) Bayesian inference for Markov jump processes (MJPs) is often renderedimpracticable due to the requirement of many (millions of) exact simulations of the jump process.This computational cost can be controlled by replacing the inferential model with an approxima-tion based on time-discretisation. Two such approximations that are routinely applied within theSKM literature are the (discretised) chemical Langevin equation (CLE) and Poisson leap. Whenusing either approximation, the accuracy can be improved by introducing additional intermediatetime-points between observation instances and integrating over the uncertainty associated withthe induced latent process. As is the case for the MJP, the observed data likelihood under thisimplementation of time-discretisation remains intractable, requiring the use of PMMH. The keydiﬀerence however, is that the number of intermediate time-points at which the latent processmust be simulated can be pre-speciﬁed by the practitioner, with fewer time-points giving reducedcomputational cost, at the expense of accuracy of the inferential model.Taking either the (discretised) CLE or Poisson leap as the inferential model to be ﬁtted, weincreased the eﬃciency of PMMH by adapting the recently proposed correlated pseudo-marginalMetropolis–Hastings (CPMMH) algorithm (Deligiannidis et al., 2017; Dahlin et al., 2015) to our17 .45 0.50 0.55 . . . . . −8 −7 −6 −5 −4 . . . . . . . . . . . P S f r ag r e p l a ce m e n t s D e n s i t y D e n s i t y D e n s i t y D e n s i t y c c c c Figure 6: Epidemic model. Marginal posterior distributions based on the output of CPMMH(histograms). Prior densities are given by the solid lines.setting. Positive correlation between successive observed data likelihood estimates was introducedby correlating the innovations that drive the proposal mechanism in the auxiliary particle ﬁlter.Essentially, the innovations are drawn from a kernel that satisﬁes detailed balance with respect tothe innovation density. For a Gaussian innovation density (as is the case when using the CLE), aCrank–Nicolson proposal can be used. In the case of the Poisson leap, it is straightforward to mapbetween Gaussian draws from a Crank–Nicolson proposal and the required Poisson variates. Whilstthe degree of correlation present in the generation of the Gaussian innovations may be close to one,this does not necessarily directly translate into high correlations in the observed data likelihood.Nevertheless, in our experiments we see an improvement in performance relative to the standardPMMH scheme.For a fully observed, error-free immigration–death model, we found that it was possible to ob-tain an increase in overall eﬃciency (as measured by minimum eﬀective sample size per second)of CPMMH over PMMH of around two orders of magnitude, whilst giving comparable perfor-mance to the modiﬁed innovation scheme of Golightly and Wilkinson (2008). To investigate theeﬀect of measurement error, we applied each competing scheme to synthetic data generated froma Lotka–Volterra system and further corrupted with additive Gaussian noise. Not surprisingly, theperformance of CPMMH worsens as the measurement error is increased, although we note thateven in a relatively extreme scenario where the measurement error variance and average speciesvalues are of a similar order of magnitude, CPMMH outperforms PMMH by a factor of 2. Wefurther applied CPMMH to a Poisson leap approximation of an autoregulatory network and to anSDE model of an inﬂuenza outbreak in a boys’ boarding school. Despite only observing a subsetof model components in both examples, we found that CPMMH outperforms PMMH by a factorof around 3 for the autoregulatory network and by a factor of 7 for the epidemic model. We notethat bigger eﬃciency gains can be potentially achieved for extremely long data sets. For univariatemodels, it may be possible to scale the number of particles N at rate n / (where n is the numberof observations) rather than at rate n , as is necessary for PMMH (B´erard et al., 2014). For bivari-ate models, it may be possible to scale N at rate n / . See Deligiannidis et al. (2017) for furtherdiscussion.The CPMMH algorithm can be improved upon in a number of ways. When using a particle ﬁlterto estimate the observed data likelihood, it may be beneﬁcial to resample less often, thus preservingcorrelation between successive estimates of the observed data likelihood. Whether or not this ispractically feasible will depend on the accuracy of the driving bridge proposal process. In scenarioswith relatively few observations and when the proposal process is particularly eﬀective, it may evenbe possible to avoid the resampling step altogether so that the particle ﬁlter is replaced by an18mportance sampler. The algorithm would also beneﬁt from the availability of parallel computingarchitectures. In this case, the block pseudo-marginal method of Choppala et al. (2016) could beused. A comparison of this approach with the methods described in this paper remains an area ofactive research. Acknowledgements

The authors would like to thank the Associate Editor and two anonymousreferees for their suggestions for improving this paper.

References

Anderson, D. F. (2008). Incorporating postleap checks in tau-leaping.

The Journal of ChemicalPhysics , 128:054103.Andersson, H. k. and Britton, T. (2000).

Stochastic epidemic models and their statistical analysis ,volume 151 of

Lecture Notes in Statistics . Springer-Verlag, New York.Andrieu, C., Doucet, A., and Holenstein, R. (2010). Particle Markov chain Monte Carlo methods(with discussion).

Journal of the Royal Statistical Society, Series B , 72(3):1–269.Andrieu, C. and Roberts, G. O. (2009). The pseudo-marginal approach for eﬃcient computation.

Annals of Statistics , 37:697–725.Beaumont, M. A. (2003). Estimation of population growth or decline in genetically monitoredpopulations.

Genetics , 164:1139–1160.B´erard, J., Del Moral, P., and Doucet, A. (2014). A lognormal central limit theorem for particleapproximations of normalizing constants.

Electronic Journal of Probability , 19:1–28.BMJ News and Notes (1978). Inﬂuenza in a boarding school.

British Medical Journal , page 587.Boys, R. J. and Giles, P. R. (2007). Bayesian inference for stochastic epidemic models with time-inhomogeneous removal rates.

Journal of Mathematical Biology , 55:223–247.Boys, R. J., Wilkinson, D. J., and Kirkwood, T. B. L. (2008). Bayesian inference for a discretelyobserved stochastic kinetic model.

Statistics and Computing , 18:125–135.Choppala, P., Gunawan, D., Chen, J., Tran, M.-N., and Kohn, R. (2016). Bayesian inferencefor state space models using block and correlated pseudo marginal methods. Available from http://arxiv.org/abs/1612.07072 .Dahlin, J., Lindsten, F., Kronander, J., and Schon, T. B. (2015). Acceleratingpseudo-marginal Metropolis-Hastings by correlating auxiliary variables. Available from https://arxiv.1511.05483v1 .Del Moral, P. (2004).

Feynman-Kac Formulae: Genealogical and Interacting Particle Systems withApplications . Springer, New York.Del Moral, P. and Murray, L. M. (2015). Sequential Monte Carlo with highly informative observa-tions.

SIAM/ASA Journal on Uncertainty Quantiﬁcation , 3(1):969–997.Deligiannidis, G., Doucet, A., and Pitt, M. K. (2017). The correlated pseudo-marginal method.Available from https://arxiv.1511.04992v4 .Doucet, A., Pitt, M. K., and Kohn, R. (2015). Eﬃcient implementation of Markov chain MonteCarlo when using an unbiased likelihood estimator.

Biometrika , 102:295–313.19urham, G. B. and Gallant, R. A. (2002). Numerical techniques for maximum likelihood estimationof continuous time diﬀusion processes.

Journal of Business and Economic Statistics , 20:279–316.Ferm, L., L¨otstedt, P., and Hellander, A. (2008). A hierarchy of approximations of the masterequation scaled by a size parameter.

Journal of Scientiﬁc Compututing , 34(2):127–151.Fuchs, C. (2013).

Inference for diﬀusion processes with applications in Life Sciences . Springer,Heidelberg.Gerber, M. and Chopin, N. (2015). Sequential quasi Monte Carlo (with discussion).

Journal of theRoyal Statistical Society, Series B , 77:509–579.Gillespie, C. S. and Golightly, A. (2010). Bayesian inference for generalized stochastic populationgrowth models with application to aphids.

Journal of the Royal Statistical Society, Series C ,52:341–357.Gillespie, D. T. (1976). A general method for numerically simulating the stochastic time evolutionof coupled chemical reactions.

Journal of Computational Physics , 22(4):403–434.Gillespie, D. T. (1977). Exact stochastic simulation of coupled chemical reactions.

Journal ofPhysical Chemistry , 81:2340–2361.Gillespie, D. T. (1992). A rigorous derivation of the chemical master equation.

Physica A , 188:404–425.Gillespie, D. T. (2000). The chemical Langevin equation.

Journal of Chemical Physics , 113(1):297–306.Gillespie, D. T. (2001). Approximate accelerated stochastic simulation of chemically reacting sys-tems.

Journal of Chemical Physics , 115(4):1716–1732.Golightly, A. and Wilkinson, D. J. (2005). Bayesian inference for stochastic kinetic models using adiﬀusion approximation.

Biometrics , 61(3):781–788.Golightly, A. and Wilkinson, D. J. (2008). Bayesian inference for nonlinear multivariate diﬀusionmodels observed with error.

Computational Statistics and Data Analysis , 52(3):1674–1693.Golightly, A. and Wilkinson, D. J. (2011). Bayesian parameter inference for stochastic biochemicalnetwork models using particle Markov chain Monte Carlo.

Interface Focus , 1(6):807–820.Golightly, A. and Wilkinson, D. J. (2015). Bayesian inference for Markov jump processes withinformative observations.

Statistical Applications in Genetics and Molecular Biology , 14(2):169–188.Gordon, N. J., Salmond, D. J., and Smith, A. F. M. (1993). Novel approach to nonlinear/non-Gaussian Bayesian state estimation.

IEE Proceedings-F , 140:107–113.Hey, K. L., Momiji, H., Featherstone, K., Davis, J. R. E., White, M. R. H., Rand, D. A., andFinkenst¨adt, B. (2015). A stochastic transcriptional switch model for single cell imaging data.

Biostatistics , 16:655–669.Koblents, E. and Miguez, J. (2015). A population Monte Carlo scheme with transformed weightsand its application to stochastic kinetic models.

Statistics and Computing , 25:407–425.K¨unsch, H. R. (2013). Partilce ﬁlters.

Bernoulli , 19:1391–1403.20in, J. and Ludkovski, M. (2013). Sequential Bayesian inference in hidden Markov stochastickinetic models with application to detection and response to seasonal epidemics.

Statistics andComputing , 24:1047–1062.McKinley, T. J., Ross, J. V., Deardon, R., and Cook, A. R. (2014). Simulation-based Bayesianinference for epidemic models.

Computational Statistics and Data Analysis , 71:434–447.O’Neill, P. D. and Roberts, G. O. (1999). Bayesian inference for partially observed stochasticepidemics.

Journal of the Royal Statistical Society, Series A , 162:121–129.Owen, J., Wilkinson, D. J., and Gillespie, C. S. (2015). Likelihood free inference for Markovprocesses: a comparison.

Statistical Applications in Genetics and Molecular Biology , 14(2):189–209.Pitt, M. K., dos Santos Silva, R., Giordani, P., and Kohn, R. (2012). On some properties of Markovchain Monte Carlo simulation methods based on the particle ﬁlter.

Journal of Econometrics ,171(2):134–151.Pitt, M. K. and Shephard, N. (1999). Filtering via simulation: Auxiliary particle ﬁlters.

Journalof the American Statistical Association , 446:590–599.Plummer, M., Best, N., Cowles, K., and Vines, K. (2006). CODA: convergence diagnosis andoutput analysis for MCMC.

R News , 6(1):7–11.Roberts, G. O. and Stramer, O. (2001). On inference for non-linear diﬀusion models usingMetropolis-Hastings algorithms.

Biometrika , 88(3):603–621.Sherlock, C., Golightly, A., and Gillespie, C. S. (2014). Bayesian inference for hybrid discrete-continuous systems biology models.

Inverse Problems , 30:114005.Sherlock, C., Thiery, A., Roberts, G. O., and Rosenthal, J. S. (2015). On the eﬀciency of pseudo-marginal random walk Metropolis algorithms.

The Annals of Statistics , 43(1):238–275.Stramer, O. and Bognar, M. (2011). Bayesian inference for irreducible diﬀusion processes using thepseudo-marginal approach.

Bayesian Analysis , 6:231–258.Tran, M.-N., Kohn, R., Quiroz, M., and Villani, M. (2016). Block-wise pseudo-marginal Metropolis-Hastings. Available from http://arxiv.org/abs/1603.02485 .Wang, Y., Christley, S., Mjolsness, E., and Xie, X. (2010). Parameter inference for discretelyobserved stochastic kinetic models using stochastic gradient descent.

BMC Systems Biology ,4:99.Whitaker, G. A., Golightly, A., Boys, R. J., and Sherlock, C. (2017a). Bayesian inference fordiﬀusion driven mixed-eﬀects models.

Bayesian Analysis , 12:435–463.Whitaker, G. A., Golightly, A., Boys, R. J., and Sherlock, C. (2017b). Improved bridge constructsfor stochastic diﬀerential equations.

Statistics and Computing , 27:885–900.Wilkinson, D. J. (2009). Stochastic modelling for quantitative description of heterogeneous biolog-ical systems.

Nature Reviews Genetics , 10:122–133.Wilkinson, D. J. (2012).

Stochastic Modelling for Systems Biology . Chapman & Hall/CRC Press,Boca Raton, Florida, 2nd edition. 21

Modiﬁed innovation scheme

We give a brief description of the modiﬁed innovation scheme (MIS) and refer the reader toWhitaker et al. (2017a) and the references therein for further details.Consider the joint posterior of c and the latent process x under the CLE given by π ( c, x ) ∝ π ( c ) p ( x | c ) p ( y | x )where p ( x | c ) and p ( y | x ) can be found in (6) and (7). A Gibbs sampler can be used to generatedraws from π ( c, x ) by alternately sampling from the full conditionals1. p ( x | c, y ),2. p ( c | x ).It is straightforward to sample p ( x | c, y ) using Metropolis within Gibbs coupled with a suitableblocking approach. For example, the latent process can be updated over each interval [ t − , t +1], t =1 , , . . . , n − p ( c | x ) can be sampled viaMetropolis within Gibbs however for small values of ∆ τ , dependence between the parameters andlatent process can render this approach impractical. This well known problem is discussed at lengthin Roberts and Stramer (2001). The issue is circumvented by the MIS via a reparameterisation.The basic idea is to draw parameter values conditional on a process whose quadratic variation doesnot determine c . For example, for a time interval [0 , T ], conditioning on the innovations that drivethe modiﬁed diﬀusion bridge construct leads to the continuous-time innovation process dZ t = β ( X t , c ) − / (cid:18) dX t − x T − X t T − t dt (cid:19) (16)= β ( X t , c ) − / (cid:26) α ( X t , c ) − x T − X t T − t (cid:27) dt + dW t where α ( X t , c ) = S h ( X t , c ) and β ( X t , c ) = S diag { h ( X t , c ) } S T . A justiﬁcation for conditioningon realisations of this process in a Gibbs sampler can be found in Fuchs (2013). In practice, wework with a discretisation of (16), that is, the modiﬁed diﬀusion bridge construct. For the inducedinvertible mapping x = f ( z ) (where we have suppressed dependence of f ( · ) on c and the values ofthe latent process at the observation times), the full conditional density required in step 2 is easilyshown to be p ( c | z ) ∝ π ( c ) p { f ( z ) | c } J { f ( z ) | c } (17)where p { f ( z ) | c } is given by (6) and J { f ( z ) | c } ∝ n − Y t =1 m − Y k =1 (cid:12)(cid:12) β ( x τ t,k − , c ) (cid:12)(cid:12) − / is the Jacobian determinant of f . Naturally, the full conditional in (17) will typically be intractable,requiring the use of Metropolis-within-Gibbs updates. We propose to update log cc