Stochastic Epidemic Models inference and diagnosis with Poisson Random Measure Data Augmentation
SStochastic Epidemic Models inference and diagnosiswith Poisson Random Measure Data Augmentation
Benjamin Nguyen-Van-Yen a,b, ∗ , Pierre Del Moral c , Bernard Cazelles b,d,e a Institut Pasteur, Unit´e de G´en´etique Fonctionnelle des Maladies Infectieuses, UMR 2000CNRS, Paris, France b Institut de Biologie de l’ENS (IBENS), Ecole Normale Sup´erieure, CNRS, INSERM,Universit´e PSL, 75005 Paris, France c INRIA, Bordeaux Research Center, France d International Center for Mathematical and Computational Modeling of Complex Systems(UMMISCO), UMI 209, UPMC/IRD, France e iGLOBE, UMI CNRS 3157, University of Arizona, Tucson, Arizona, United States ofAmerica Abstract
We present a new Bayesian inference method for compartmental modelsthat takes into account the intrinsic stochasticity of the process. We show howto formulate a SIR-type Markov jump process as the solution of a stochasticdifferential equation with respect to a Poisson Random Measure (PRM), andhow to simulate the process trajectory deterministically from a parameter valueand a PRM realisation.This forms the basis of our Data Augmented MCMC, which consists in aug-menting parameter space with the unobserved PRM value. The resulting sim-ple Metropolis-Hastings sampler acts as an efficient simulation-based inferencemethod, that can easily be transferred from model to model.Compared with a recent Data Augmentation method based on Gibbs sam-pling of individual infection histories, PRM-augmented MCMC scales muchbetter with epidemic size and is far more flexible.PRM-augmented MCMC also yields a posteriori estimates of the PRM, thatrepresent process stochasticity, and which can be used to validate the model. Ifthe model is good, the posterior distribution should exhibit no pattern and beclose to the PRM prior distribution. We illustrate this by fitting a non-seasonalmodel to some simulated seasonal case count data.Applied to the Zika epidemic of 2013 in French Polynesia, our approachshows that a simple SEIR model cannot correctly reproduce both the initialsharp increase in the number of cases as well as the final proportion of seropos-itive. ∗ Corresponding author
Preprint submitted to Elsevier April 23, 2020 a r X i v : . [ s t a t . M E ] A p r RM-augmentation thus provides a coherent story for Stochastic EpidemicModel inference, where explicitly inferring process stochasticity helps with modelvalidation.
1. Introduction
Stochasticity plays an important role in infectious disease dynamics, butstatistical inference of stochastic models is difficult. When trying to estimatethe parameters θ of a complex stochastic epidemic model from some data D , thedata D are typically an incomplete observation of the process, and the observeddata likelihood P ( D | θ ) is intractable [1]. To work around this problem, onecan decide to not use the likelihood (ABC, deep learning), to approximate themodel (linear noise approximations), to estimate the likelihood (PMCMC [2, 3]),or to rephrase the problem.Data Augmentation (DA) introduces the alternative problem of estimatingthe joint posterior P ( θ, ν | D ), instead of P ( θ | D ), where ν is some latentvariable chosen so that the joint likelihood P ( D | θ, ν ) is tractable. Theoriginal problem then finds its solution by marginalizing over ν , P ( θ | D ) = (cid:82) P ( θ, ν | D ) dν . Data Augmentation has proven effective with MCMC [4, 5, 6],for dealing with granular, subject-level data, and complex models.An important difficulty with DA is to explore the high dimensional aug-mented state space effectively with the MCMC proposal. Each class of modelsrequires designing specific proposals, and typically it is necessary to manuallytune the proposal for the specific problem at hand. For instance in BEAST[5, 7], for which the latent state is the reconstructed phylogeny, many differentmoves are implemented to explore tree space effectively, and one is chosen atevery iteration following certain probabilities. For a given model and dataset,one should manually tune those probabilities to achieve good mixing.Additionally, it is challenging to design a proposal such that mixing speedscales well with population size, which becomes necessary when dealing witheven moderately large epidemics.We propose a new data augmentation scheme to fit simple models to lowgranularity data – like incidence or prevalence data. This scheme is directlyapplicable to the large class of Markov jump processes.Pure jump Markov processes are widely used for modelling infectious dis-eases, ecological and evolutionary systems. It is possible to represent them assolutions of Poisson-driven stochastic differential equations, which are integralequations with respect to a Poisson Random Measure (PRM). This indicatesa natural data augmentation parametrization ( θ, ν ) with θ the parameters ofthe process, and ν the discrete measure that the process is integrated against.A value of θ and a realisation ν of the PRM are enough to deterministicallysimulate the trajectory X of the process, and so the joint likelihood of θ and ν is easily computed as the probability of observing D given the trajectory X .Throughout the article, we will call this method PRM augmentation.2ifferent methods have already been proposed for the inference of Markovjump processes, some specific to epidemic models [4, 8, 9, 10, 11], and somemore general [12, 13, 14].In this article, we present PRM augmentation, used with Metropolis-Hastingssampling [15, 16], and discuss its advantages and drawbacks, in terms of ease ofuse, speed, and insights, and illustrate this on some synthetic and real datasets.PRM augmentation relies on a generalization of uniformization [17, 12, 14]that is directly applicable to any Markov jump process, with writing the Stochas-tic Differential Equation (SDE) being the only mathematical work needed. Assuch, it is a simulation-based method, which can be used easily to comparedifferent models.We show how PRM augmentation compares to a recent data augmentationmethod [11], and exhibits better complexity with respect to epidemic size.Additionally, the natural distinction that is made between the mechanisticpart, estimated through θ , and the stochastic part, estimated through ν , pro-vides a way to evaluate model fit and to propose model improvements, similarlyto bayesian latent residuals [18], via the posterior ν estimates.To demonstrate this, we apply the method to simulated data, and to 2013-2014 Zika data from French Polynesia.
2. Theory
Continuous time Markov chains, or Markov pure jump processes, are stochas-tic processes whose state is constant in between jumps, and jump only at ex-ponential times, and that have the Markov property. The trajectories of suchprocesses are right continuous, left limit, and piecewise constant. A completepresentation can be found in [19]. Such processes are widely used in stochasticmodelling, and notably for instance in pomp [20] or BEAST [5].Poisson random measures (PRM) are a generalization of Poisson processesto more general spaces, as random variables that take discrete measure values.For a PRM, the number of points found in any measurable subset A follows aPoisson distribution, and the number of points found in two disjoint measurablesubsets are independent.A Markov pure jump process can be formulated as the solution to a Poissondriven stochastic differential equation (SDE). This is an established result [21],and can be seen as a generalization of uniformization [17]. We provide a proof inthe appendix B that solutions to the equation that interests us here are c`adl`agand Markov, and have the right infinitesimal generator.The SDE hints at a natural deterministic algorithm to simulate a trajectory X of the process exactly, given a realisation ν of the PRM. Examine the pointsof ν ordered by time ; for each successive point, compute the rate of the eventcorresponding to that point ; if the value u of the point is below the rate, thenaccept the event and update the state accordingly, and if not reject the event3the state remains the same). This iteratively draws a trajectory, with jumpsthat can only be positioned at points of ν , as illustrated in Fig. 1, and followingalgorithm 1.A value of θ and a value of ν uniquely determine the trajectory of the process,and thus are enough to compute the likelihood of observing the data P ( D | θ, ν ).The a priori independence of θ and ν helps mixing in the case of weaklyinformative data [22], provided we are able to simulate the process efficiently. We consider a Markov Jump Process (MJP) X with a state space E = Z d ,and a finite number of events K . The k -th event happens with a rate r k : E → R + , and when it happens, the state changes from x to x + µ k . Let ( ν k ) k ≤ K be discrete measures on R + × R + . (1) is the Poisson-driven SDE defining theprocess. X t = X + (cid:88) k ≤ K (cid:90) t (cid:90) R + u ≤ r k ( X s − ) µ k ν k ( ds, du ) (1)This equation specifies a way to simulate the process. Each point of the ( ν k )is a potential event point. Consider all points of the ( ν k ) ordered by time, andfor each one, compare the event rate to the u coordinate of the point. If thepoint is below the rate, then the event happens, and we update the state of thesystem and the rates. Otherwise nothing happens, and we continue on to thenext point. This yields the simplified algorithm 1. input : The discrete measures ( ν k ) k ≤ K , the rates ( r k ), the increments( µ k ), and the initial condition X ∈ E output: Trajectory ( X t ) t Initialize t = 0, X t = X ;Iterate over the points of ( ν ) ordered by time, ( t n , u n , k n ) n ; for n = to ∞ doif u n ≤ r k n ( X t n − ) then X t n = µ k n ( X t n − ); else X t n = X t n − ; endendAlgorithm 1: Simplified algorithm for exact simulation of a MJPThere is an obvious problem with algorithm 1 however. The ( ν k ) havean infinite number of points on every time interval, so we cannot order allof their points by time. Instead we split R + × R + into rectangles A i,j , anddefine C i = (cid:83) j A i,j = [ t i , t i +1 ] each time column. In each rectangle A i,j ,the number of points N ki,j of ν k is finite almost surely, and ν k can be writ-4en ν k = (cid:80) n ≤ N ki,j δ ( t i,j,kn , u i,j,kn ), with the points ( t i,j,kn , u i,j,kn ) ordered by time.Equation (2) is equivalent to (1) X t = X + K (cid:88) k =1 ∞ (cid:88) i =1 ∞ (cid:88) j =1 N ki,j (cid:88) n =1 t i,j,kn ≤ t u n ≤ r k ( X ti,j,kn − ) µ k (2)We never need the rectangles that are above the rate. We can use this factto simulate the process with finite memory, by lazily drawing points for therectangles only when they are needed. This exact algorithm is given in theappendix D, as algorithm D2.This exact algorithm is O ( n ) in the number of points that are considered. Asa consequence, an important factor in its efficiency is the proportion of pointsthat are considered but rejected. This is also true of classical uniformization,for which a constant upper bound must be chosen in advance. In contrast, here,the algorithm automatically adapts the upper bound to the variations of therates, so that the rejection rate can be kept small.What’s more, we can further reduce the complexity by using an approximatesimulation algorithm ˜ X = ˜ f ( θ, ν ). To do that, we take inspiration from tau-leaping algorithms, and treat the rates as if they were constant on each timeinterval [ t i , t i +1 ], r ik = r k ( X t i ). We then count the points below the rate r ik ineach interval. We store the number of points present in each A i,j for each eventto make this faster. The only points left to consider one by one are the onesfrom the single A i,j such that r ik is in [ u j , u j +1 ]. This approximate algorithm iswritten out in the appendix D, as algorithm D3.Once equipped with efficient simulations, we can use this PRM augmentationscheme for parameter inference with MCMC.5 igure 1: Exact simulation of a SIR process with respect to an infection and a recoverymeasure.The black crosses are the points of the infection and recovery measures respectively. Theorange curves are the rates of infection and recovery respectively.When a point is under the rate curve, then the event happens. When an infection happens, I (and the rates) increase, and when a recovery happens, I (and the rates) decrease. Initiallythere is one infected and the whole population is susceptible. .3. PRM-augmented MCMC We want to estimate the posterior distribution of the parameter values θ and of the discrete measure ν of the model ( X t ) = f ( θ, ν ), given the data D ,that is P ( θ, ν | D ) ∝ P ( D | ( X t )) P ( θ ) P ( ν ) P ( θ ) is the prior distribution placed on θ , and P ( ν ) is the PRM distribution.We can use the Metropolis-Hastings algorithm [15, 16], in the same way asfor a deterministic system. At every iteration of the MCMC, one draws newvalues θ (cid:48) and ν (cid:48) from the proposal, simulates the corresponding trajectory with f or ˜ f , ( X t ) (cid:48) = f ( θ (cid:48) , ν (cid:48) ), computes the likelihood to observe the data P ( D | ( X t ) (cid:48) ),then accepts or rejects the move according to the Metropolis-Hastings ratio.For a PRM, points in disjoint subsets are independent. Thus, if ν is a PRMsample, we can choose a subset of ν , erase its points and redraw new points fromthe PRM process, and obtain a new (correlated) PRM sample. This proposal Q ν is reversible with respect to the PRM prior, and thus can be used for MCMC.More details are given in the appendix C. Q ν can be used with any proposal Q θ for the parameters, with density q θ to form a Metropolis-Hastings proposal, asshown in algorithm 2. input : θ , ν output: ¯ θ , ¯ νθ (cid:48) ∼ Q θ ( . | θ ); ν (cid:48) ∼ Q ν ( . | ν ); X (cid:48) = f ( θ (cid:48) , ν (cid:48) ) (or X (cid:48) = ˜ f ( θ (cid:48) , ν (cid:48) )); α = P ( D | X (cid:48) ) P ( D | X ) P ( θ (cid:48) ) P ( θ ) q ( θ | θ (cid:48) ) q ( θ (cid:48) | θ ) ; u ∼ U ([0 , if u < α thenreturn X (cid:48) ; elsereturn X ; end Algorithm 2: PRM Metropolis-Hastings proposal
To illustrate our method, we show how to apply it to the classical SIRmodel with case count data, and give both the deterministic (3) and stochastic(4) formulations of the model here, to show the correspondence between them. S is the number of susceptible hosts, I the number of infected hosts, R thenumber of removed hosts, and C ( t ) the total number of cases, observed uponinfection, up to time t . 7 is the effective contact rate, and γ is the rate of recovery (inverse durationof infection), while ρ is the reporting probability, that is the probability thatwhen an infection happens, we observe it. dSdt = − β SN IdIdt = β SN I − γIdRdt = γIdCdt = ρβ SN IN = S + I + R (3)To make the notations less heavy in the SDE, we define the total rate ofinfection λ ( s ) = β S s N s I s . Also let ν I and ν R be independent PRMs on R + × R + with intensity the Lebesgue measure, for the events of infections and recoveriesrespectively.The equations in (3) and (4) and their terms are written in the same orderso as to show the correspondence between them. S t = S + (cid:90) t (cid:90) R + − u ≤ λ ( s − ) ν I ( ds, du ) I t = I + (cid:90) t (cid:90) R + (cid:0) u ≤ λ ( s − ) ν I ( ds, du ) − u ≤ γI s − ν R ( ds, du ) (cid:1) R t = R + (cid:90) t (cid:90) R + u ≤ γI s − ν R ( ds, du ) C t = (cid:90) t (cid:90) R + u ≤ ρλ ( s − ) ν I ( ds, du ) N t = S t + I t + R t (4)We also use more complex versions of the SIR model, by making the effectivecontact rate seasonal, by adding host demography, immigration, immunity loss(SIRS) or a latent class (SEIR). To obtain the corresponding SDE, we only needto add a new PRM for every event, and adapt the event rates.For example, to add host demography, we add the four events of birth ofsusceptible, death of susceptible, death of infectious, and death of removed, withrates BN ∗ , DS , DI , and DR , and the four independent PRMs, ν B , ν DS , ν DI respectively.To make the model seasonal, we would replace β by β ( t ) = β m + β v sin (2 π + φ ). The equations for these different models are given in the appendix E.To perform inference on the model, we need an observation model to definethe likelihood. For incidence data, the observations are the number of cases ineach time interval C ( t i ) − C ( t i − ), and for inference we consider that the case8ount in each interval is negative binomial. The additional degree of freedomcompared to the more natural Poisson distribution helps the fitting on realdata. For prevalence data, we consider that the prevalence observed at time t i follows the binomial distribution B ( I ( t i ) , ρ ). In both cases, the observationsare independent conditionally on the trajectory, and the full likelihood is simplythe product of the likelihoods.We will also apply PRM-augmented MCMC to the simple SIR model withno seasonality, host vitality, or immigration, with prevalence data, in which casethe observations follow the binomial distribution B ( I ( t i ) , ρ ).
3. Materials and Methods
The algorithm is implemented in the OCaml language [23], and the projectrepository is at https://gitlab.com/bnguyenvanyen/ocamlecoevo
For the comparisons, we simulate daily prevalence data with the simple SIRmodel over a one month duration, with a population size N taking values 500,1000, 2000, 4000 and 8000. We estimate the base effective contact rate β , andthe initial conditions S , I and R .We start the MCMC from the target parameter values, to not take intoaccount differences in convergence speed between methods.The different methods we compare are the PRM-augmented MCMC withexact simulations, the PRM-augmented MCMC with approximate simulations,and a subject-level data augmentation MCMC with Gibbs sampling presentedin [11].For the PRM-augmented MCMC, we first sample ν from its prior for 100iterations, then continue with our custom proposal for 2000000 iterations.For the Gibbs sampler we perform 10000 iterations. Its moves in processspace consist of redrawing the infectious history of one subject, conditional onthe data and of the histories of the other subjects. To maintain mixing as N increases, we keep the number of histories redrawn per iteration proportional to N – 20 for 500 and 320 for 8000.As a measure of computational complexity for the different estimation pro-cedures, we compute the amount of computational time it takes to obtain oneeffective sample from the posterior.The multi-dimensional effective sample size [24] is defined as mESS ( n ) = n (cid:18) | Λ || Σ | (cid:19) p where n is the number of samples, p the number of dimensions estimated, | . | thedeterminant, Λ the covariance structure of the posterior, and Σ the asymptoticcovariance matrix of the Markov chain. Λ can be estimated with the samplecovariance matrix, and Σ with the batch means covariance matrix.9he space we explore is infinite dimensional so this definition is not directlyapplicable. We use the values the process takes at the datapoints i , X i instead.The trajectories of the simple SIR model live in the 2-simplex, so for m data-points, the total dimension is then p = 3 + 2 m for β , the initial conditions andthe ( X i ). β ( t ) is constant ( β m = 0).The data is then inferred with the stochastic seasonal model, and with thestochastic constant model by PRM-augmented MCMC. The French Polynesia Zika dataset is composed of weekly case counts forMoorea island and of seroprevalence data from a cohort of 196 people from thearchipelago. The case count data has been made available in [25, 26], and theseroprevalence data in [27, 26].We fit a SEIR model with immigration to the data with PRM-augmentedMCMC. The seroprevalence likelihood is binomial with probability the removedproportion.The details of the model and the prior distributions are given in the appendixE.3.The PRM-augmented MCMC is run for 200000 iterations for convergenceand adaptation of the covariance matrix [28], then for 10 iterations, from which1000 samples are kept.
4. Results
The comparison between exact and approximate simulations in Fig. 2 showsthat both simulation schemes scale in the same order on population size, butthat the approximate scheme is much more economical. In this example, for apopulation of 8000, a million iterations took a bit less than 4 hours with theapproximate method, but close to 11 hours with the exact method. As popu-lation size increases, stochasticity plays a less important role, and it becomesincreasingly advantageous to use the approximate simulations. For small popu-lations and with high time resolution however, it might be better to use exactsimulations to avoid the bias caused by approximation.10 igure 2: Comparison of execution time for exact and approximate simulations.Execution time (in minutes) as a function of host population size, on a log-log scale. In blue,inference with the approximate simulations, and in orange, with the exact simulations.Figure 3: Comparison of complexity as a function of population size of PRM-augmentationand subject level augmentation.In blue, PRM-augmentation, and in orange, subject level augmentation. (a)
Complexity measured as number of iterations niter per effective sample (b)
Complexity measured as CPU time in minutes per effective sample
11s both schemes rely on the same data structure for ν , it is easy to switchfrom one to the other, and would be straightforward to implement switchingwhen required during estimation.The PRM-augmented MCMC also proved much more efficient than the sub-ject level data augmentation method described in [11], as is shown in Fig. 3.The Gibbs sampler from [11] mixes much better than PRM-augmented MCMCwhen only looking at the mESS 3.2 by iteration, yielding around 1 effective sam-ple every 100 iterations, against 1 every 6000 iterations. However, every one ofits iterations is much more costly. In theory, the complexity for an iteration ofsubject level augmentation scales with the square of the population size. Re-sampling a subject’s history grows linearly with the number of events, and thenumber of histories to redraw per iteration also grows linearly with populationsize to maintain mixing. This quickly becomes problematic, even with moderatepopulation sizes.In contrast, the mixing in the case of PRM-augmentation doesn’t vary muchwith population size, only the cost of an iteration does. In practice, a linearregression indicates an exponent of 2 . . To see the Markov jump process as the solution of an SDE with respect to aPRM is to create a separation between the noise driving the process, containedin the PRM, and the mechanism of the process, described by the parametersof the process. By inferring θ and ν jointly, then, we are hoping to capturethe mechanistic part in θ , and the noise part in ν , provided that the processexplains the data well. ν act as Bayesian latent residuals [18]. If the posterior distribution of ν isvery different from its prior distribution of a standard PRM (intensity one),then our hypotheses about the noise in the model are wrong. A pattern foundin the posterior for ν means that the pattern in the data is not entirely capturedby the mechanism proposed, or said another way, that the model is underfittingthe data. The nature of the pattern in the posterior for ν can then provide cluesto improve the model to capture the pattern.The posterior samples for ν can thus be used both to evaluate model fit, andto improve the model.We illustrate this with an example on the SEIRS model, in a seasonal anda constant version (see 3.3, page 10). We generate some data with the seasonalmodel, then compare the fit of this true seasonal model to the data, to the fitof the constant model to the data.We can see in F ig.
4, (a) that we are able to fit the constant model to theseasonal data. To evaluate the fit, we can look at the posterior samples of ν .We define the point density of ν for event k and for a time slice i as the numberof points by unit of volume, d ki = ν ( C ki ) | C ki | . The posterior ν estimates (Fig. 4,12 igure 4: Comparison of the fit of the seasonal and the constant model to simulated seasonaldata.In black , the simulated daily case count data. In blue, in the first column, simulations withthe true θ value, and ν sampled from its prior distribution. In orange, in the second column,posterior samples for the seasonal model. In green, in the last column, posterior samples forthe constant model. (a) Daily case counts in the data ( black ), and from MCMC samples. (b)
Number of points by unit volume by slice of time for infection events in the posteriorsamples for ν . The solid line is the mean, and the envelope the square deviation of thedistribution. (c) Auto-correlation function of the function in (b) (Mean, and square deviation envelope).The average effective contact rate β m , the initial conditions ( S , I , R ), and the discretemeasure ν are estimated. For both models, 250 ν samples are kept for estimation, out of500000 iterations after convergence. (b)) for the true seasonal model are very similar to the prior. However for theconstant model, the infection point density rises at the start of every epidemicseason, and falls at the ends. For the constant model to reproduce the data, itis necessary to have more infections than expected at the start of the epidemicseason, and less than expected at the end. That is, the seasonal signal (Fig. 4,(c)) is captured in the posterior for ν , for us to notice and then include into themodel.This example is artificial, since the seasonal signal is already very obviousin the data and would be included in the model from the start.However, it leads to two interesting remarks. First, even though the constantmodel explains the seasonal data very badly, we are still able to fit it to theseasonal data. This shows that the MCMC is able to reach a priori very unlikelydiscrete measure values, and so that we are able to explore the discrete measure13pace well. Second, if we look at the estimates for the other events, we can seethat recovery is nearly the symmetric of infection. The model doesn’t reallymake a difference between more infections and less recoveries, or variation ininfection rate, and variation in recovery rate. This shows that this procedureis not a replacement for actually including time-varying parameters [29], with aprior chosen to reflect how likely we expect those variations to be. During the Zika epidemic in French Polynesia, in 2013-2014, the case countsindicate that the epidemic progressed very fast, but seroprevalence studies showthat only around 50% of the population got infected.The fit of a simple SEIR model with PRM-augmented MCMC shows thatthe model cannot correctly capture the data, if we restricut ourselves to realisticincubation and infectious periods, as shown in Fig. 5. The posterior ν distri-bution contains peaks of point density at the beginning of the epidemic, mostnotably for the event of becoming infectious, and for the event of infection. Thismeans that trajectories of the model that fit to the data have more infectionsand shorter incubation periods than expected by intrinsic noise, only at thestart of the epidemic. There is no reason to expect that this could happen.Said another way, in the initial period, cases happen faster than the model canfollow. More realistically, this could also be explained by an increase in thereporting probability as the epidemic becomes noticed, or also by populationstructure [26, 30]. 14 igure 5: Fit of the SEIR model to the Zika data from Moorea island, 2013-2014.In black , the data. In blue, posterior samples for the stochastic model.The first panel represents the weekly number of cases, in the data, and in the posterior samples.The following panels represent the average and standard deviation of the density of points byunit of volume, in the ν posterior samples, for the events of infection, S → E , leaving theexposed class, E → I , and recovery, I → R .1000 samples were kept from 10 iterations after convergence. . Discussion Epidemiological data is ever more abundant and complex, and can help usbetter understand the dynamics of infectious diseases, answer important theo-retical questions and address public health problems. Existing inference meth-ods, however, are quickly becoming a limiting factor, because of a prohibitivecomputational cost, a difficulty in applying the method to arbitrary models, orboth. The search for new methods progresses in different directions, to buildmethods that are both faster and more easily transferable to different models.For MCMC, one such advance has been the development of adaptive meth-ods [31, 32, 28]. They can relieve the user from the burden of designing propos-als taking into account the particular structure of the model, as the adaptivemethod aims to discover that structure automatically.The adoption of these methods for complex models takes time. In the caseof stochastic processes, the problem is complicated by the very large (infinitedimensional) latent variable space to explore.As a consequence, a large part of the data augmentation methods for stochas-tic processes tend to concern themselves with Gibbs sampling, or Metropolis-Hastings component-wise updates when Gibbs sampling cannot be achieved.For Markov jump processes, an efficient sampler based on uniformization[17] was proposed in [12, 14]. They make a clever use of uniformization to applya method from Markov chains to MJPs. This method does not scale well tolarge state spaces however, and so it is not practical in the case of stochasticepidemic models.Instead, data augmentation methods targeted specifically at stochastic epi-demic models have also been developed these last 20 years [4, 8, 6, 10, 11]. Theytypically use component-wise updates, the effectiveness of which depends cru-cially on the degree of posterior independence of the different components. Inthese methods, the latent variables represent the subject-level disease histories.This is a centered parametrization [22], as the latent variables ν are a sufficientstatistic for θ . By opposition, one can also use a non-centered parametriza-tion, in which the latent variables ν and θ are a priori independent, or phrasedanother way, ν is an ancillary statistic for θ .In the centered case, when the data brings little information, θ and ν arevery correlated and separate updates for θ and ν must be very conservative. Inthat case, non-centering can be much more effective, as θ and ν will be nearlyindependent. If instead the data is very informative, then the reverse is true,and the centered parametrization will mix better than the non-centered one.This question of centering and non-centering is crucial for mixing speed inthe case of component-wise updates, and has already been discussed in theliterature around stochastic epidemic models [8].For typical stochastic epidemic models, it is often possible to re-parametrizethe model in terms of uniform random variates, for instance, and thus to ob-tain a non-centered parametrization. This can be the basis for non-centeredsimulation-based MCMC [10]. However, the parametrization is ad hoc andmust be found by the modeller. In constrast, we believe PRM-augmentation16rovides a canonical non-centered parametrization, and thus it is a good choicefor low-granularity data like we have shown here. For the case when the data ismore informative, it should be possible to design joint adaptive proposals thatcould move reasonably far in trajectory space.Indeed, the increased efficiency compared with the subject-level data aug-mentation method from [11] can be in part attributed to this. The subject-levelaugmentation is a centered parametrization, and the cost of using Gibbs sam-pling to obtain new histories is that only one history can be resampled at atime. Another method of conditional resimulation that should scale better withpopulation size is however presented in [33].PRM-augmented MCMC presents an interesting tradeoff for moderatelylarge population sizes, and a possible alternative to linear noise approxima-tions [34] or PMCMC [2]. At the same time, the method is easily used inpractice, as it is akin to a simulation-based method, that one can directly trans-pose to new models, with no lengthy and error-prone mathematical derivationsor implementation.It would be interesting to make a quantitative comparison with PMCMC,but it is difficult to do this in a fair manner, as the efficiency of PMCMCcrucially depends on the number of particles used.However, we would like to argue that the main advantage of PRM-augmentationis that it is in a sense a natural parametrization of the model, which makes ourassumptions about the nature of the stochasticity very clear. It leads to a clearseparation between the process mechanism, desribed by θ , and the process noise,described by ν , that facilitates interpretation and is very useful for model diag-nosis (section 4.2). The procedure is identical to the one presented in [18, 35],with two differences. First, the ν samples are obtained as part of the inference,and not after. That way, we don’t need to be able to compute ν = f − ( θ, X )(and in our case, f is not injective). Second, and most importantly, we havea clear meaning for what the latent residuals ν represent, so that it is easy tointerpret deviations from the prior.In conclusion, PRM-augmented MCMC is a practical method for the in-ference of moderate epidemics where stochasticity cannot be ignored. As asimulation-based method, one can easily apply it to many different models, andthe PRM samples obtained make it possible to diagnose, compare, and proposebetter models. Declaration of Competing Interest
The authors declare no conflict of interest.
Acknowledgements
BN and BC have been supported by a grant from Agence Nationale de laRecherche for the PANIC project (Targeting PAthogen’s NIChe: a new approachfor infectious diseases control in low-income countries: ANR-14-CE02-0015-01).17N is a doctoral student at the Centre de Recherche Interdisciplinaire, in theUniversit´e de Paris.
References [1] Philip D. O’Neill. A tutorial introduction to Bayesian inference for stochas-tic epidemic models using Markov chain Monte Carlo methods.
Mathemat-ical Biosciences , 180(1-2):103–114, November 2002.[2] Christophe Andrieu, Arnaud Doucet, and Roman Holenstein. ParticleMarkov chain Monte Carlo methods: Particle Markov Chain Monte CarloMethods.
Journal of the Royal Statistical Society: Series B (StatisticalMethodology) , 72(3):269–342, June 2010.[3] Pierre Del Moral, Ajay Jasra, Anthony Lee, Christopher Yau, andXiaole Zhang. The Alive Particle Filter and Its Use in ParticleMarkov Chain Monte Carlo.
Stochastic Analysis and Applications ,33(6):943–974, November 2015. Publisher: Taylor & Francis eprint:https://doi.org/10.1080/07362994.2015.1060892.[4] S. Cauchemez, F. Carrat, C. Viboud, A. J. Valleron, and P. Y. Bo¨elle. ABayesian MCMC approach to study transmission of influenza: applicationto household longitudinal data.
Statistics in Medicine , 23(22):3469–3487,November 2004.[5] Alexei J Drummond and Andrew Rambaut. BEAST: Bayesian evolutionaryanalysis by sampling trees.
BMC evolutionary biology , 7(1):214, 2007.[6] Chris P. Jewell, Theodore Kypraios, Peter Neal, and Gareth O. Roberts.Bayesian analysis for emerging infectious diseases.
Bayesian Analysis ,4(3):465–496, September 2009.[7] Remco Bouckaert, Joseph Heled, Denise K¨uhnert, Tim Vaughan, Chieh-HsiWu, Dong Xie, Marc A Suchard, Andrew Rambaut, and Alexei J Drum-mond. BEAST 2: a software platform for Bayesian evolutionary analysis.
PLoS Comput Biol , 10(4):e1003537, 2014.[8] Peter Neal and Gareth Roberts. A case study in non-centering for dataaugmentation: Stochastic epidemics.
Statistics and Computing , 15(4):315–327, October 2005.[9] Fei Xiang and Peter Neal. Efficient MCMC for temporal epidemics viaparameter reduction.
Computational Statistics & Data Analysis , 80:240–250, December 2014.[10] Peter Neal and Chien Lin Terry Huang. Forward Simulation Markov ChainMonte Carlo with Applications to Stochastic Epidemic Models.
Scandina-vian Journal of Statistics , 42(2):378–396, 2015.1811] Jonathan Fintzi, Xiang Cui, Jon Wakefield, and Vladimir N. Minin. Effi-cient Data Augmentation for Fitting Stochastic Epidemic Models to Preva-lence Data.
Journal of Computational and Graphical Statistics , 26(4):918–929, October 2017.[12] Vinayak Rao and Yee Whye Teh. Fast MCMC Sampling for Markov JumpProcesses and Extensions.
The Journal of Machine Learning Research ,14:26, 2013.[13] Andrew Golightly and Darren J. Wilkinson. Bayesian inference for Markovjump processes with informative observations.
Statistical Applications inGenetics and Molecular Biology , 14(2):169–188, 2015.[14] Boqian Zhang and Vinayak Rao. Efficient parameter sampling for Markovjump processes. arXiv:1704.02369 [stat] , April 2017. arXiv: 1704.02369.[15] Nicholas Metropolis, Arianna W. Rosenbluth, Marshall N. Rosenbluth, Au-gusta H. Teller, and Edward Teller. Equation of State Calculations by FastComputing Machines.
The Journal of Chemical Physics , 21(6):1087–1092,June 1953. Publisher: American Institute of Physics.[16] W. K. Hastings. Monte Carlo sampling methods using Markov chains andtheir applications.
Biometrika , 57(1):97–109, April 1970. Publisher: OxfordAcademic.[17] Arne Jensen. Markoff chains as an aid in the study of Markoff processes.
Scandinavian Actuarial Journal , 1953(sup1):87–91, January 1953.[18] M. S. Y. Lau, G. Marion, G. Streftaris, and G. J. Gibson. New model diag-nostics for spatio-temporal systems in epidemiology and ecology.
Journalof The Royal Society Interface , 11(93):20131093–20131093, February 2014.[19] Stewart N. Ethier and Thomas G. Kurtz.
Markov Processes: Characteri-zation and Convergence . John Wiley & Sons, September 2009.[20] Aaron A. King, Dao Nguyen, and Edward L. Ionides. Statistical Inferencefor Partially Observed Markov Processes via the R Package pomp.
Journalof Statistical Software , 69(1):1–43, March 2016. Number: 1.[21] Nobuyuki Ikeda and Shinzo Watanabe.
Stochastic differential Equationsand diffusion processes . Number 24 in North-Holland mathematical Li-brary. North-Holland [u.a.], Amsterdam, 2. ed edition, 1989. OCLC:20080337.[22] Omiros Papaspiliopoulos, Gareth O. Roberts, and Martin Sk¨old. A Gen-eral Framework for the Parametrization of Hierarchical Models.
StatisticalScience , 22(1):59–73, February 2007.[23] OCaml, an industrial strength programming language.1924] Dootika Vats, James M. Flegal, and Galin L. Jones. Multivariate OutputAnalysis for Markov chain Monte Carlo. arXiv:1512.07713 [math, stat] ,December 2015. arXiv: 1512.07713.[25] H. P. Mallet, Anne-Laure Vial, and Didier Musso. Bilan de l’epidemiea virus Zika en Polynesie Francaise, 2013–2014.
Bulletin d’informationsanitaires, ´epid´emiologiques et statistiques , pages 20–21, 2015.[26] Clara Champagne, David Georges Salthouse, Richard Paul, Van-Mai Cao-Lormeau, Benjamin Roche, and Bernard Cazelles. Structure in the vari-ability of the basic reproductive number (R0) for Zika epidemics in thePacific islands. eLife , 5:e19874, November 2016.[27] Maite Aubry, Anita Teissier, Michael Huart, S´ebastien Merceron, Jes-sica Vanhomwegen, Claudine Roche, Anne-Laure Vial, Sylvianne Teu-rurai, S´ebastien Sicard, Sylvie Paulous, Philippe Despr`es, Jean-ClaudeManuguerra, Henri-Pierre Mallet, Didier Musso, Xavier Deparis, andVan-Mai Cao-Lormeau. Zika Virus Seroprevalence, French Polynesia,2014–2015.
Emerging Infectious Diseases , 23(4):669–672, April 2017.[28] Matti Vihola. Robust adaptive Metropolis algorithm with coerced accep-tance rate.
Statistics and Computing , 22(5):997–1008, September 2012.[29] Bernard Cazelles, Clara Champagne, and Joseph Dureau. Accounting fornon-stationarity in epidemiology by embedding time-varying parameters instochastic models.
PLOS Computational Biology , 14(8):e1006211, August2018.[30] Adam J. Kucharski, Sebastian Funk, Rosalind M. Eggo, Henri-Pierre Mal-let, W. John Edmunds, and Eric J. Nilles. Transmission Dynamics of ZikaVirus in Island Populations: A Modelling Analysis of the 2013–14 FrenchPolynesia Outbreak.
PLOS Neglected Tropical Diseases , 10(5):e0004726,May 2016.[31] Heikki Haario, Eero Saksman, and Johanna Tamminen. An adaptiveMetropolis algorithm.
Bernoulli , 7(2):223–242, April 2001.[32] Gareth O. Roberts and Jeffrey S. Rosenthal. Examples of Adaptive MCMC.
Journal of Computational and Graphical Statistics , 18(2):349–367, January2009.[33] Pooley C. M., Bishop S. C., and Marion G. Using model-based proposals forfast parameter inference on discrete state space, continuous-time Markovprocesses.
Journal of The Royal Society Interface , 12(107):20150225, June2015.[34] Jonathan Fintzi, Jon Wakefield, and Vladimir N. Minin. A linear noiseapproximation for stochastic epidemic models fit to partially observedincidence counts. arXiv:2001.05099 [q-bio, stat] , January 2020. arXiv:2001.05099. 2035] Max S. Y. Lau, Bryan T. Grenfell, Colin J. Worby, and Gavin J. Gib-son. Model diagnostics and refinement for phylodynamic models.