[PDF] A Bayesian Nonparametric Analysis of the 2003 Outbreak of Highly Pathogenic Avian Influenza in the Netherlands

Abstract

Infectious diseases on farms pose both public and animal health risks, so understanding how they spread between farms is crucial for developing disease control strategies to prevent future outbreaks. We develop novel Bayesian nonparametric methodology to fit spatial stochastic transmission models in which the infection rate between any two farms is a function that depends on the distance between them, but without assuming a specified parametric form. Making nonparametric inference in this context is challenging since the likelihood function of the observed data is intractable because the underlying transmission process is unobserved. We adopt a fully Bayesian approach by assigning a transformed Gaussian Process prior distribution to the infection rate function, and then develop an efficient data augmentation Markov Chain Monte Carlo algorithm to perform Bayesian inference. We use the posterior predictive distribution to simulate the effect of different disease control methods and their economic impact. We analyse a large outbreak of Avian Influenza in the Netherlands and infer the between-farm infection rate, as well as the unknown infection status of farms which were pre-emptively culled. We use our results to analyse ring-culling strategies, and conclude that although effective, ring-culling has limited impact in high density areas.

Full PDF

AA Bayesian Nonparametric Analysis of the 2003 Outbreakof Highly Pathogenic Avian Inﬂuenza in the Netherlands

Rowland G. Seymour ∗ , Theodore Kypraios , Philip D. O’Neill , Thomas J. Hagenaars School of Mathematical Sciences, University of Nottingham, UK Wageningen Bioveterinary Research (WBVR), Lelystad, The Netherlands

Abstract

Infectious diseases on farms pose both public and animal health risks, so under-standing how they spread between farms is crucial for developing disease controlstrategies to prevent future outbreaks. We develop novel Bayesian nonparametricmethodology to ﬁt spatial stochastic transmission models in which the infection ratebetween any two farms is a function that depends on the distance between them,but without assuming a speciﬁed parametric form. Making nonparametric inferencein this context is challenging since the likelihood function of the observed data isintractable because the underlying transmission process is unobserved. We adopt afully Bayesian approach by assigning a transformed Gaussian Process prior distribu-tion to the infection rate function, and then develop an eﬃcient data augmentationMarkov Chain Monte Carlo algorithm to perform Bayesian inference. We use theposterior predictive distribution to simulate the eﬀect of diﬀerent disease controlmethods and their economic impact. We analyse a large outbreak of Avian In-ﬂuenza in the Netherlands and infer the between-farm infection rate, as well asthe unknown infection status of farms which were pre-emptively culled. We useour results to analyse ring-culling strategies, and conclude that although eﬀective,ring-culling has limited impact in high density areas.

Keywords:

Avian Inﬂuenza, Bayesian Nonparametrics, Disease Control, Epidemic Mod-els, Gaussian Processes

Diseases of livestock and farmed poultry, such as Avian Inﬂuenza or Foot and MouthDisease, pose serious public and animal health risks, as well as having a considerableimpact on both the domestic and international farming economy. Authorities are keen tocontrol the spread of such diseases as quickly as possible to reduce the health risks, butmust also consider other stakeholders, such as farmers, and the economic consequencesof intervention. ∗ To whom correspondence should be addressed. a r X i v : . [ s t a t . A P ] S e p n 2003 a serious outbreak of a highly pathogenic Avian Inﬂuenza A/H7N7 virustook place among poultry farms in the Netherlands. Over the course of three months,more than 30 millions birds were culled, 90 people developed inﬂuenza-like symptoms,with six conﬁrmed cases, and one fatality occurred (Koopmans et al., 2004). The Dutchauthorities implemented a culling strategy to control the disease, whereby animals wereculled on farms where the pathogen was detected, and pre-emptively culled on farmswithin a certain distance from the site of detection. For convenience we shall referto farms as naturally culled or pre-emptively culled in the obvious manner. The cullingstrategy took place alongside strict biosecurity measures and a ban on the transportationof poultry goods (Directorate-General for Health and Consumers, 2003). In the data setwe use there is a total of 5,397 Dutch poultry farms, including 241 infected farms and1,232 pre-emptively culled farms. The approximate locations of the farms are shown inFigure 1.There is a clear spatial element to the spread of the disease; for example, thereare two distinctive clusters of infected farms, within which there appears to be localtransmission. However, analysing the disease spread is challenging due to the fact thatthe infection process is not observed. For farms which were conﬁrmed to be infected,the date of poultry culling was recorded, but the date on which poultry on the farmwere ﬁrst infected was unobserved. The infection status of pre-emptively culled farms isconsidered uncertain, since absence of clinical suspicion at the time of culling would notnecessarily rule out the presence of the pathogen.Various data were collected during the outbreak. The particular data set that we shallfocus on consists of the spatial coordinates of all poultry farms in the Netherlands, plusthe culling dates and identities of all farms that were either naturally or pre-emptivelyculled. There are several previous approaches to modelling data from this outbreak.In Stegeman et al. (2004), the authors construct a model based on a generalised linearmodel proposed in Becker (1989), where the number of new infections per day is as-sumed to follow a Binomial distribution. However, the infection rate is assumed to beconstant between all farms, which is a questionable assumption given the clear spatialelement to the spread of the disease. In Boender et al. (2007), the authors use a typeof generalised linear model which allows for spatial variation in spread of the disease,and propose several plausible forms for the infection rate as a function of distance. It isassumed that pre-emptively culled farms are never infected, and that unobserved eventssuch as infections occur at known times, obtained by simple assumptions motivated byexpert opinion. Models are ﬁtted using maximum likelihood methods and the AkaikeInformation Criterion is used to choose between them. In Backer et al. (2015), the au-thors take a diﬀerent approach allowing for both within- and between-farm infections.The authors use a spatial model described in Boender et al. (2007) and choose valuesfor the model parameters based on those in van der Goot et al. (2005) among others.The outbreak has also been studied from a public health and veterinary perspective,analysing the symptoms both humans and poultry display, see, for example, Fouchieret al. (2004); Koopmans et al. (2004); Elbers et al. (2004).Although some previous modelling approaches attempt to capture the spatial varia-2 llllllllllllllllllllll ll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllll llllll llllll llllllllll llllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllll lllllllllllllllllllllllll ll llllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllll lll l llllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l llllllllllllllllllllll l lllllllllllllllllllllllllllllll llllll llllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllll llllllllllll llllllllllll lllll llll ll ll lllllllll l lllllllllll llllllllllllllll l lllllllll lll llllllllllllll lll lllllllllllllllllllll llllllllllllll l llllllllllllll lllllllllllllllllll ll llllllll lll ll lllllllll llllllll l llllllll llllllllllllll ll lll llll lllll lll l llllll ll l l lllllllll l l ll lll lllllllllllllllll lll lllll l l l lll ll llll llllll l l lllll ll ll llllllllllll l lllllll llllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllll l l ll llllllllll lllllllllllll ll ll lllll llllllll lll ll lllllllllll lllll lllllllllllllllllllllllllllllllllllllllllllll lll l lllll l llllllllllllllllllllllllllllllllllllll llllllllllllll llll ll l llll llllllllll llll llllllllllllllllllllllllllllllllll l ll ll ll ll l lllllllllllllllllllllll lll ll llll llllll llllllllll lll llll ll llll llllllll l ll l llll l ll ll ll lll ll lll lllll ll lll llll lllll llllll llllllllll llllllllllll lllll llll l lll lllll lllll ll l lll lllll llll ll ll ll llllll lllll ll l llll llllllllll ll ll l l ll ll ll llll lll lll l lll SusceptiblePre−emptively CulledInfected

Figure 1: A map of the poultry farms in the Netherlands with their status at the end ofthe outbreak. 3ion in the infection rate, they rely on making strict parametric assumptions about theinfection rate as a function of the distance between farms; such functions are commonlycalled distance kernels. The choice of a particular distance kernel may not accuratelyrepresent the underlying process and can lead to incorrect predictions which, in conse-quence, can have a signiﬁcant impact on formulating policy decisions with regards tooptimal disease control measures such as culling. Our approach removes the need tomake such assumptions by modelling the infection rate nonparametrically. We do thisby treating the infection rate as an unknown function with a transformed Gaussian Pro-cess (GP) prior distribution. This allows us to make more general assumptions aboutthe type of function, for example how smooth it is, whether it is continuous, or if it ismonotonic, rather than its exact shape. Furthermore, previous modelling approaches as-sume that the times at which farms were infected are known. In this paper we relax thisassumption, by adopting a data-augmentation approach within a Bayesian frameworkin which we treat infection times as additional parameters. We make inference for theinfection rate function using a Markov Chain Monte Carlo (MCMC) algorithm, whichalso allows us to infer the unobserved infection times, and to estimate the probability ofany pre-emptively culled farm having been infected.The paper is structured as follows. In Section 2 we describe the available data,deﬁne our stochastic transmission model in detail and derive an augmented likelihoodfunction assuming that the epidemic process is fully observed. In Section 3 we presentour Bayesian nonparametric approach by specifying a transformed GP prior distributionfor the infection rate function and the prior distributions for the other model parameters.We also describe an eﬃcient MCMC algorithm to sample from the posterior distributionof the parameters given the observed data. In Section 4 we demonstrate the proposedmodels and methods via an application to simulated data and the Avian Inﬂuenza dataset. We also illustrate how our methods can be used to assess control strategies. Weﬁnish in Section 5 with brief conclusions and a discussion of our methods.

The data set contains the geographical locations of 5,397 poultry farms in the Nether-lands at the time of the outbreak. We discarded 931 hobby farms from the data set,where a hobby farm is deﬁned as having fewer than 500 poultry. For each farm, thedata speciﬁes its status at the end of the outbreak, describing whether or not it hadcontracted the virus, had been culled due to conﬁrmed infection, or had been culled pre-emptively. For farms which were culled we have the date on which this occurred. Afterthe removal of hobby farms, the data set contains 4,466 farms. Of these, 233 farms wereconﬁrmed to be infected and consequently culled, while 1,232 farms were pre-emptivelyculled. Table 1 illustrates the available information for each farm in the data set.4able 1: An example of the available information on each farm. Farms 1 and 2 wereconﬁrmed to be infected and culled in consequence. Farm 3 was culled pre-emptivelyand farm 4 was not culled. Farm geographical locations are provided in terms of x and y coordinates.Farm ID x y Culling Date Pre-emptively Culled1 5.25 52.13 5 th May × th April × nd May (cid:88)

We construct our model based on the standard susceptible-infective-removed epidemicmodel in continuous time; see, for example, Bailey (1975); Andersson and Britton (2000).Consider a population consisting of N farms. We assume that initially all farms aredisease free apart from one which contains animals infected via some external source.At any time t , a farm is either susceptible to the disease, infected with the disease andinfectious, or removed as the animals on the farm have been culled. The model dynamicscan be separated into two processes: the infection process and the removal process. Theinfection process is governed by a rate function β ( d ), where d denotes the Euclideandistance between two farms.We assume an infectious farm infects a given susceptible farm that is d km awayaccording to a Poisson process with rate β ( d ). The processes governing diﬀerent pairs offarms are assumed to be independent. For the removal process, once a farm is infectedit is assumed to be infectious for a time which follows a Gamma distribution, Γ( λ, γ ),which has mean λ/γ and variance λ/γ . The infectious periods of diﬀerent farms areassumed to be independent. Note that the infectious period of a farm is the time betweeninfection and culling as a result of infection being detected, rather than the time periodduring which animals would be infectious in the absence of any intervention.To account for the fact that some farms are pre-emptively culled by the authorities asa disease control measure, we introduce pre-emptive culling times. We make no attemptto explicitly model the culling strategy, since in practice such strategies may change overtime or not always be carried out as originally intended. Instead, we assume that pre-emptive cullings are deterministic events. If, under the disease control strategy, a farm ispre-emptively culled at time t , then the farm becomes removed at time t irrespective ofwhether it is currently susceptible or infectious. From this time, it can no longer infectother farms or be infected. We shall refer to culling events that are not pre-emptive asnatural cullings. The epidemic continues until there are no more infected farms.5able 2: The infectious status of each farm at the end of the outbreak.Set Infected Culled Pre-emptively Culled A × × ×B (cid:88) (cid:88) ×C (cid:88) (cid:88) (cid:88) D × (cid:88) (cid:88)

Recall that the observed data consist of culling times, which can be pre-emptive or not,and farm locations. To ﬁt our model to such data in a Bayesian framework requires thelikelihood of the observed data given the model parameters. However, such a likelihoodis intractable in practice since its computation involves integrating over all unknowninfection events; see, for example, O’Neill and Roberts (1999); Jewell et al. (2009). Wetherefore proceed by deriving a likelihood based on full observation of the epidemicprocess, and use a data-augmentation MCMC algorithm as described in Section 3.Let N denote the total number of poultry farms in the Netherlands and n the numberof ever-infected farms. We denote the infection and culling times for farm j by i j and r j respectively, where culling may be pre-emptive or natural. We label the infectedfarms 1 , . . . , n by their culling date (i.e. r ≤ r ≤ . . . ≤ r n ) and the remaining farms n + 1 , . . . , N arbitrarily. We denote by ω the label of the initially infected farm.We denote by i = { i , . . . , i ω − , i ω +1 , . . . , i N } the set of all infection times excludingthe initial infection time i ω . If farm j was not infected, its infection time is set to be i j = ∞ . We account for pre-emptive culling by deﬁning r j = min( r pj , r cj ), where r pj and r cj denote, respectively, the pre-emptive and natural culling time of farm j . We considerthe times r pj to be deterministic, and set r pj = ∞ if farm j was not pre-emptively culled.For farms which were not culled at all, we set r pj = r cj = ∞ , hence r j = ∞ . Thesets r c = { r c , . . . , r cN } and r p = { r p , . . . , r pN } denote the set of natural and pre-emptiveculling times, respectively.We require the following sets based on the infection status of the farms during theoutbreak. Set A consists of the farms that remained susceptible to the disease throughoutthe course of the epidemic and were not culled, set B is the set of farms that were infectedwith the virus and naturally culled in consequence, set C is the set of farms that wereinfected but were culled pre-emptively, and ﬁnally set D consists of the farms that werenot infected but still pre-emptively culled. These sets are shown in table 2. Note thatif a farm has been pre-emptively culled, we are unable to distinguish whether it belongsto set C or D unless its infection status is known.The likelihood function consists of three parts: a contribution from farms avoidinginfection, a contribution from farms being infected, and a contribution from farms re-maining infectious until culled. For a farm k in either set A , B or C , the probability itavoids infection from infectious farm j , until either j is removed or k is infected, is ψ j,k = exp {− β ( d j,k )(( r j ∧ i k ) − ( i j ∧ i k )) } , β ( d ) is the infection rate for a pair of farms that are d km apart, and a ∧ b =min { a, b } . The diﬀerence in minimum times is the amount of time during which farm j is able to infect k . If farm k is in set D we must take into account its pre-emptive cullingtime, r k = r pk , and the corresponding probability is given by ψ j,k = exp {− β ( d j,k )(( r j ∧ r k ) − ( i j ∧ r k )) } . When farm j is infected, the set of farms that are able to infect j is Y j = { k : i k < i j < r k } , so the event that j is infected contributes to the likelihood function through the overallhazard rate of infection given by φ j = (cid:88) k ∈Y j β ( d k,j ) . For the removal process, the likelihood contribution is given by (cid:89) j ∈B h ( r j − i j | λ, γ ) (cid:89) j ∈C S ( r j − i j | λ, γ ) , where h ( x | λ, γ ) is the probability density function of a Γ( λ, γ ) distribution evaluatedat x and S ( x | λ, γ ) is the survivor function S ( x | λ, γ ) = ∞ (cid:90) x h ( u | λ, γ ) du. Farms in set B , that were infected and culled at the end of their infectious period,contribute to the likelihood function through the total time during which they wereinfectious. For those in set C , which were infected but culled pre-emptively, we con-sider their removal time as a censoring time, and compute the probability they wouldhave remained infectious past their culling time. Combining the infection and removalprocesses gives the augmented likelihood function π ( i , r c , B , C , D | β, λ, γ, ω, i ω , r p )=  (cid:89) j ∈B∪C N (cid:89) k =1 ψ j,k   (cid:89) j ∈B∪C j (cid:54) = ω φ j  (cid:89) j ∈B h ( r j − i j | λ, γ ) (cid:89) j ∈C S ( r j − i j | λ, γ ) (1)= exp {− Ψ } (cid:89) j ∈B∪C j (cid:54) = ω  (cid:88) k ∈Y j β ( d k,j )  (cid:89) j ∈B h ( r j − i j | λ, γ ) (cid:89) j ∈C S ( r j − i j | λ, γ ) , (cid:88) j ∈B∪C (cid:34) (cid:88) k ∈A∪B∪C β ( d j,k ) (( r j ∧ i k ) − ( i j ∧ i k ))+ (cid:88) k ∈D β ( d j,k ) (( r j ∧ r k ) − ( i j ∧ r k )) (cid:35) . (2)Note that the set of culling times determines which farms belong to the set A , whichis why A does not appear explicitly in the left-hand side of equation (1). We wish to make Bayesian inference for the unknown model parameters given the ob-served data of farm locations and culling dates (see Table 1). If a farm was not culledby the end of the outbreak, we assume that it remained susceptible throughout the out-break. Hence, the observed culling dates determine which farms belong to set A . For afarm that has been pre-emptively culled, its infection status is unknown and thereforewe cannot determine from the observed data if such a farm belongs to set C or D . Also,the infection process deﬁned in our model is not observed directly. Hence, the label ofthe initially infected farm ω , its infection time i ω and the infection times of the farmsbelonging to sets B or C are unknown.We adopt a data augmentation framework (see, for example, Jewell et al. (2009)) inwhich we include the farms’ unknown infection event times and statuses as additionalmodel parameters to the ones which govern the transmission and removal processes.Combining the augmented data likelihood (1) with the joint prior distribution, by usingBayes’ theorem, the target posterior density is given by π ( β, γ, ω, i ω , i , C , D | B , λ, r p , r c ) ∝ π ( i , r c , B , C , D | β ( · ) , λ, γ, ω, i ω , r p ) × π ( β ) π ( λ ) π ( i ω | ω ) π ( ω ) , where we have assumed that β , λ and ( ω, i ω ) are independent a priori . We now discuss in detail the prior distributions for the infection rate function and theother model parameters.

We wish to infer the infection rate function β nonparametrically and to do so we willuse a transformed Gaussian Process (GP) as a prior distribution. We follow Rasmussenand Williams (2006) and deﬁne a GP as a collection of points, any ﬁnite subset of which8ollow a multivariate Normal distribution. Suppose we wish to model a function f , overa space χ , speciﬁcally being interested in the values of the function f ( x ) , . . . , f ( x n )evaluated at the points x = { x , . . . , x n } . We specify the GP prior distribution on f by f ∼ GP ( µ, Σ) , where µ is the mean function and Σ the covariance matrix, deﬁned using a covariancefunction k . We build our assumptions about f into the model through the covariancematrix, and to do so we use the squared exponential function. This is given byΣ i,j = k ( x i , x j ; α, l ) , k ( x i , x j ; α, l ) = α exp (cid:26) − ( x i − x j ) l (cid:27) . The function k has two hyperparameters, namely α , which controls the overall variance,and l , which controls the length scale. The value of l essentially determines how muchthe function can change as the input changes. We implicitly assume that f is smoothand diﬀerentiable. Many other choices for the kernel function are available (Rasmussenand Williams, 2006), but our choice appears suitable for the application at hand.The input space of the function β is the space of Euclidean distances. We speciﬁcallywish to evaluate β at d , the set of pair-wise distances between all farms. As the GP priordistribution gives non-zero probability to negative values and we are modelling a ratewhich is always positive, we introduce a dummy function g and use the transformation β = exp { g } . In other words, we are placing a GP prior distribution on log β by specifyingthat g ∼ GP (0 , Σ) , Σ ij = k ( d i , d j ; α, l ) , β = exp { g } where d i is the Euclidean distance between the i th pair of farms.A well-known problem arises with GPs when the size of the covariance matrix is large,since this creates computational diﬃculties with matrix inversion and decomposition; see,for example, Hensman et al. (2013); Csato and Opper (2002); Quinonero-Candela andRasmussen (2005). For the Avian Inﬂuenza data set, there are over 9 million unique pair-wise distances from which the covariance matrix is constructed. In the MCMC algorithmwe will develop, we will require the covariance matrix to be repeatedly decomposedand inverted, which is not feasible in practice with such a large matrix. We thereforeapproximate the GP prior distribution using a projection method ﬁrst described inQuinonero-Candela and Rasmussen (2005). We construct a pseudo set of distances, ¯ d ,that is much smaller than the original set d . While ¯ d need not be a subset of d it shouldprovide an adequate representation of d . We then place a GP prior distribution on thepseudo set and draw samples from this distribution. The joint prior distribution of thepseudo function, ¯ f and the full function f is (cid:18) ¯ ff (cid:19) ∼ GP (cid:18) , (cid:18) Σ ¯ d , ¯ d Σ d , ¯ d Σ ¯ d , d Σ d , d (cid:19)(cid:19) , where the subscripts on the Σ matrices denote the vectors used to construct them. Wecan then project the samples onto the full data set by considering the conditional distri-bution of f given the pseudo function ¯ f . We take f to be the mean of this distribution,9hich is given by f = Σ d , ¯ d Σ − d ¯ f , where Σ ¯ d is the covariance matrix of the approximating prior distribution. Some care isneeded to construct the pseudo set ¯ d , because we must ensure there are enough pointsto capture the features of β . This method assumes that the prior distribution over thepseudo data set has the same properties as that over the original data set, which is areasonable assumption as they are both sets of Euclidean distances.The value of the length scale parameter l can have a material impact on the results,and it is not obvious how to assign a suitable value. We therefore place a non-informativeprior distribution on this parameter, speciﬁcally l ∼ Exp(0 . a ) denotesan exponential distribution with mean a − . We assume the variance parameter α to beknown a priori , and we choose a value such that samples from the prior distribution areover a large enough range to capture the scale of the infection rate. Recall that the infectious period distribution is assumed to be a Γ( λ, γ ) distribution.We follow Jewell et al. (2009) and assume that λ is known and place an uninformativeprior distribution on γ , speciﬁcally γ ∼ Exp(0 . ω . We set a time axis by assuming the ﬁrst culling to be attime zero, so that r = 0, and set the prior distribution on the infection time of ω by( i ω | ω ) = − z, z ∼ Exp(0 . . The density of the full posterior distribution is given by π ( β, γ, ω, i ω , i , C , D | B , λ, r c , r p ) ∝ exp {− Ψ } n (cid:89) j =1 j (cid:54) = ω  (cid:88) k ∈Y j exp { g ( d k,j ) }  (cid:89) j ∈B h ( r j − i j | λ, γ ) (cid:89) j ∈C S ( r j − i j | λ, γ ) × GP ( g ; , Σ) exp {− . l } exp {− . γ } exp { . i ω } . The likelihood function is the same as in equation (1) and Ψ is given in equation (2)with β replaced by the inferred value exp { g } . The term GP ( g ; , Σ) refers to the ﬁnitedimension form of the GP, which is the probability density function of a multivariateGaussian distribution evaluated at g ( d ), with the corresponding mean vector andcovariance matrix Σ. We cannot sample from the posterior distribution directly soconstruct an MCMC algorithm, which is shown in Algorithm 1. There are ﬁve mainsteps to the algorithm and these are described in detail below.10 lgorithm 1 Structure of the MCMC algorithm Initialise the chain with values g (0) , γ (0) , l (0) , and i (0) ; Repeat the following steps: Sample g using a Metropolis-Hastings step; Sample l using a random walk Metropolis-Hastings step; Sample γ using a random walk Metropolis-Hastings step; Sample ω using a random walk Metropolis-Hastings step; Sample i ω using a random walk Metropolis-Hastings step; Choose one of the following steps with equal probability: • Update an infection time • Remove an infection time for a pre-emptively culled farm • Add an infection time for a pre-emptively culled farm

The ﬁrst step is concerned with sampling the dummy function g , which we do using anunderrelaxed proposal mechanism (Neal, 1995). This allows us to update the functionas a block while reducing computational complexity. Given the current function g , wepropose a new function g (cid:48) by g (cid:48) ( d ) = (cid:112) − δ g ( d ) + δν ( d ) , ν ∼ GP ( , Σ) , where δ ∈ (0 ,

1] is a tuning parameter and Σ is the covariance matrix used in the priordistribution for β . The computational advantage of this is that the prior ratio is theinverse of the proposal ratio so the Metropolis-Hastings acceptance probability reducesto the likelihood ratio p acc = π ( i , r c , B , C , D | g (cid:48) , λ, γ, ω, i ω , r p ) π ( i , r c , B , C , D | g, λ, γ, ω, i ω , r p ) ∧ . l To update the GP prior distribution length scale, l , we use a Gaussian random walkMetropolis algorithm by ﬁrst proposing a new length scale, l (cid:48) , from N ( l, σ l ) where γ isthe current value and σ l is a tuning parameter, and then accept l (cid:48) with probability p acc = GP ( g ; , Σ l (cid:48) ) GP ( g ; , Σ l (cid:48) ) π ( l (cid:48) ) π ( l ) ∧ . .2.3 Updating γ To update γ , we also use a Gaussian random walk Metropolis algorithm by proposing γ (cid:48) from the distribution N ( γ, σ γ ) and acceping this with probability p acc = (cid:81) j ∈B h ( r j − i j | λ, γ (cid:48) ) (cid:81) j ∈C S ( r j − i j | λ, γ (cid:48) ) (cid:81) j ∈B h ( r j − i j | λ, γ ) (cid:81) j ∈C S ( r j − i j | λ, γ ) π ( γ (cid:48) ) π ( γ ) ∧ . The ﬁnal step in the algorithm concerns the unobserved infection times. We use amethod proposed in O’Neill and Roberts (1999) and then further developed in Jewellet al. (2009). We choose one of three actions with equal probability: (i) propose to movean existing infection time; (ii) propose to add a new infection time; (iii) propose to deletea previously-added infection time.(i)

Updating an infection time of a farm in sets B or C is the simplest of thethree procedures. To do this, we randomly choose a farm j that is currently infectedand propose a new infection time by i (cid:48) j = r j − t j , where t j ∼ Γ( λ, γ ) and γ denotes thecurrent value of the parameter in the chain. We accept i (cid:48) j with probability p acc = h ( r j − i j | λ, γ ) h ( r j − i (cid:48) j | λ, γ ) π ( i − i j + i (cid:48) j , r c | g, λ, γ, i ω , r p ) , B , C , D ) π ( i , r c | g, λ, γ, i ω , r p , B , C , D ) ∧ , where i − i j + i (cid:48) j is the set i with i j removed and i (cid:48) j included.(ii) When adding an infection time , ﬁrst deﬁne m to be the number of pre-emptively culled farms. We suppose that at the current iteration of the algorithm, ˜ m of the farms which were pre-emptively culled have had infection times added by thealgorithm; that is farms belonging in set C . We randomly choose one of the m − ˜ m pre-emptively culled farms with no infection time and propose an infection time for it.If m = ˜ m , we abandon this step. We generate an infection time as above and accept itwith probability p acc = 1 / ( ˜ m + 1)(1 / ( m − ˜ m )) h ( r j − i (cid:48) j | λ, γ ) π ( i + i j , r c , C , D | g, λ, γ, i ω , r p ) π ( i , r c , C , D | g, λ, γ, i ω , r p ) ∧ m − ˜ m ( ˜ m + 1) h ( r j − i (cid:48) j | λ, γ ) π ( i + i j , r c , C , D | g, λ, γ, i ω , r p ) π ( i , r c , C , D | g, λ, γ, i ω , r p ) ∧ . (iii) Finally, if we choose to delete an infection time for a pre-emptively culledfarm, we randomly choose a pre-emptively culled farm j which at the current iterationhas an infection time added and we propose to remove their infection time. Should therebe no farms with an unknown infection status, which, at the current iteration of the12lgorithm, have had an infection time added, the step is abandoned. We accept thisproposal with probability p acc = 1 / ( m − ( ˜ m − h ( r j − i j | λ, γ )1 / ˜ m π ( i − i j , r c | g, λ, γ, i ω , r p , B , C , D ) π ( i , r c | g, λ, γ, i ω , r p , B , C , D ) ∧ h ( r j − i j | λ, γ ) ˜ mm − ( ˜ m − π ( i − i j , r c , B , C , D | g, λ, γ, i ω , r p ) π ( i , r c , B , C , D | g, λ, γ, i ω , r p ) ∧ . We now present the results of our method applied to two data sets. The ﬁrst is asimulated data set and the second is the Avian Inﬂuenza data set described in Section1. We then use the posterior predictive distribution to analyse the impact of variousculling strategies for the Avian Inﬂuenza outbreak.

We generated the position of 1,000 farms uniformly at random on a square with sidelength 30km. We then simulated 250 outbreaks of Avian Inﬂuenza using the infectionrate function β ( d ) = 0 . {− d } . The infectious period distribution parameters were λ = 4 and γ = 0 .

8. This givesa mean infectious period of λ/γ = 5 days, suitable for inﬂuenza-like diseases amonglivestock. The simulations also included a deterministic culling strategy similar to thatin the actual outbreak, where once a farm was culled following a positive test all farmswithin a 1km radius were pre-emptively culled.We discarded simulated data sets with less than 100 infected farms, since our focusis towards analysing sizeable outbreaks, and any nonparametric modelling approach willstruggle with a small data set. This left 175 data sets. Mimicking the data available inthe Avian Inﬂuenza outbreak, for each simulated data set we assume that in additionto the coordinates we only observe the culling times and whether a farm has been pre-emptively culled or not. The infectious period shape parameter is assumed to be λ = 4.We ﬁx the length scale parameter as l = 3 due to the computation time required toperform inference for both the length scale and infection times for pre-emptively culledfarms for each of the simulated data sets.We ﬁtted the model described in Section 2 by assuming that the infection rate isa function that depends only on the distance between farms and assigned a GaussianProcess prior distribution as described in Section 3.1.1. Due to the number of farms, weused the GP approximation method, constructing the pseudo set of distances by taking256 equally-spaced points from zero to the largest distance in the data set. We employedthe MCMC algorithm described in Section 3.2 to ﬁt the model to each of the 175 datasets.Figure 2 shows the median rate compared to the true rate and a 95% credible intervalconstructed from all 175 posterior medians. The results demonstrate that we can infer13able 3: Summary statistics for the 175 posterior median values obtained in the simu-lation study. The probability interval is from the 2.5% to 97.5% quantiles.Parameter True Value Median 95% Prob. Int. γ λ/γ i

0% 1.40% (-9.18%, 23.0%)the infection rate function well for all pairwise distances above 0.5km, but we slightlyunderestimate the rate between immediate neighbours. This underestimate is causedby there being few farms in each data set that are less than 0.5km apart. We estimatethe median infectious period to be 5.07 days, close to the true value of 5 days, andthe 95% credible interval of the 175 estimates contains the true value of 5. This slightoverestimation is likely to be caused by the combination of slight underestimation of β atlow distances, and the fact that we only considered data sets with at least 100 infections,in which infectious periods may be slightly larger than average.To assess the results for the infection times across all simulations, we use the relativepercentage error in the sum of the infection times for each simulated outbreak, which wedenote by ˜ i , deﬁned as follows. Consider a single simulated data set. Let S denote thesum of all infection times of farms culled either naturally or pre-emptively in the dataset. Let ˆ S denote the median estimate of S obtained from the MCMC output. Notethat ˆ S implicitly takes account of which pre-emptively culled farms are imputed to havebeen infected. Then we deﬁne ˜ i = S − ˆ SS × . The median relative percentage error across all data sets is 1.40%, which demonstratesthat our method for inferring infection times gives accurate results. The results for theparameters are shown in Table 3.

We now analyse the Avian Inﬂuenza data described in Section 1. Due to the size of thedata set, we split the inference into two parts. We ﬁrst inferred plausible values of theGP prior distribution length scale parameter, l , by ﬁtting our transmission model underthe assumptions of a constant infectious period of seven days and that pre-emptivelyculled farms were not infected, as in Boender et al. (2007). We obtained a posteriormedian for l of 2.75km (95% CI: (2.55, 3.01)). The reason for inferring plausible valuesfor l separately is that estimating l requires decomposing and inverting the covariancematrix inside the MCMC algorithm which is highly computationally intensive and leadsto prohibitively long run times. This issue is ampliﬁed when the infection times areunknown as well.We repeated the inference method without assuming that the infection times or thestatus of the pre-emptively culled farms are known. Based on the results of the method14igure 2: Top left: The results of the nonparametric method for the infection rate inthe simulated data sets. We report the median and 95% credible interval of all 175posterior medians. Top right: The distribution of the posterior median estimates for themean infectious period. Bottom: The distribution of the relative error in the sum of theinfection times. . . . . . . . distance I n f e c t i on R a t e Study MedianStudy CITrue Rate

Infectious Period (Days) D en s i t y DensityTrue Value − i~ Relative Error D en s i t y DensityTrue Value α = 3 and l = 3km. We employed the GPapproximation method for this data set. As we expect the infection rate function tovary considerably over short to medium distances, we included more such distances inthe pseudo data set. The pseudo data set was ¯ d = { , . , , . . . , . , , , . . . , } .We ran the MCMC algorithm for 20,000 iterations, including a burn-in period of 500iterations. In each iteration of the MCMC algorithm, we proposed updating, adding ordeleting 200 infection times.The results for the infection rate are shown in Figure 3, where we see a logistic-typefunction that decays to zero. From this, we estimate that the probability of a farminfecting another farm which is more than 6km apart is negligible. From the credibleinterval, we see that samples from the posterior distribution take a variety of shapes,with functions that have a high infection rate over short distance decaying quickly, andfunctions that have a lower rate over short distances taking a logistic function form.We compare our results to those in Boender et al. (2007), particularly with a view tocomparing estimation of the infection rate function. The authors propose ﬁve models,shown in Table 4, which we ﬁtted to the data assuming a ﬁxed infectious period ofseven days. Model 3 was the best of the proposed models according to the DevianceInformation Criterion (Spiegelhalter et al., 2002). We reﬁtted model 3 to the dataassuming that the infection times are unknown. The results are shown in Figure 3 andone clear diﬀerence between the parametric and nonparametric methods is the associateduncertainty. Although the nonparametric method allows for a greater degree of ﬂexibility,it also induces a greater degree of uncertainty. However, we argue that the parametricmethod may underestimate the uncertainty by imposing stricter assumptions. Despitethis, both estimates are of similar shape and scale, and our results broadly agree withexisting work. We see a slight diﬀerence in the forms of the infection rate function fordistances less than 400m, which is due to there being very few farms that are less than400m apart.Since we assume infection times to be unknown, we infer them via our MCMCalgorithm. We estimate the mean infectious period to be 6.4 days, and Figure 4 showsthe distribution of median infectious periods by culling status. For farms that weresubject to pre-emptive culling, the median infectious period is shorter than for thosewho were identiﬁed as infected. This is expected as pre-emptive culling of infected farmsintroduces censoring.We estimate the probability that each pre-emptively culled farm was actually in-fected, as shown in Figure 4. All of the farms with non-zero probability of infection arelocated in the two main infection clusters. Our results show that the transmission tothe southern cluster cannot be explained by a path of shorter distance infections thatwere censored by pre-emptive culling. This is consistent with the hypothesis proposedin Bataille et al. (2011) that this long distance transmission event of Avian Inﬂuenzawas the result of a single human-mediated transport of the virus.16able 4: The proposed parametric pair-wise infection rates for the Avian Inﬂuenza dataset in Boender et al. (2007).Rate Kernel1 β ( d ) = β β ( d ) = β (1 + d ) − β ( d ) = β (1 + d ) − β ( d ) = β (1 + d β ) − β ( d ) = β (1 + ( d/β ) β ) − Figure 3: The posterior mean for the nonparametric (solid) and parametric (dashed)infection rate functions for the Avian Inﬂuenza data set. The parametric function iskernel 3 in Table 4. 17igure 4: Top: Posterior distribution of median infectious periods for farms with con-ﬁrmed infections and those pre-emptivelly culled. Bottom: Estimates of probabilitiesthat pre-emptively culled farms were infected. Only farms which were pre-emptivelyculled are plotted. Each probability is the proportion of iterations in the MCMC algo-rithm that the pre-emptively culled farm was actually infected. I ≤

33 0 033 ≤ I <

54 3 < I We now investigate how to improve the disease control measures by analysing how theculling radius aﬀects the number of infected farms. Culling infected farms has the eﬀectof reducing the time a farm is infectious, and culling susceptible farms means there arefewer farms to be infected. Although this is an eﬀective measure for controlling thespread of the disease, it can be expensive as farmers are compensated for lost livestockand it can cause negative public attitudes.To simulate the eﬀect of culling, we sample from the posterior predictive distributionof the infection and culling times. Given the observed culling times, and the posteriordistributions of g , γ and ω , we wish to generate new infection times i ∗ for all farms, andcorresponding culling times r ∗ . We do this using the posterior predictive distribution,which is given by π ( i ∗ , r ∗ | r ) = (cid:90) (cid:90) (cid:90) π ( i ∗ , r ∗ | g, γ, ω, r ) π ( g, γ, ω | r )d g d γ dω. To generate samples from this distribution we initiate the outbreak by assuming theinitially infected farm ω is the farm that was initially culled in the observed outbreak.To consider the eﬀectiveness of culling strategies, we assume that once an infected farmreaches the end of its infectious period and enters the removed class all farms up to r km away are simultaneously culled and enter the removed class. Culling cannot startimmediately as it may take time for the authorities to be notiﬁed of the disease andput measures into place, and whereas previous work (Backer et al., 2015) uses a ﬁxeddelay after the ﬁrst detection to initiate the culling measures, we allow for stochasticityin the disease take-oﬀ and assume culling takes place once a certain number of farmshave been infected. As resources may not be immediately available to the authorities,it may not be possible to cull all farms within r km and we simulate this by ﬁxing amaximum number of farms that can be culled per day. We then increase this numberover the course of the outbreak as the authorities have more available resources. Thenumbers are given in Table 5 and are based on the number of farms we estimate to havebeen infected in the observed outbreak. Similarly, we assume the authorities will nothave suﬃcient resources to cull all farms within the chosen radius at the start of theoutbreak, and we model this by assuming they initially cull farms within a radius halfas large. 19able 6: Estimates of compensation per bird paid to farmers during the Avian Inﬂuenzaoutbreak from Backer et al. (2015).Poultry Type Compensation ( e per bird)Broiler 0.98Duck 2.09Turkey 10.63Layer 2.05Table 7: Posterior predictive medians for the number of infected and culled farms andthe amount of compensation paid.Radius (km) No. Infected Farms No. Culled Farms Compensation Paid ( e millions)0 443 (151, 644) 443 (151, 644) 24.8 (8.62, 35.9)1 297 (110, 535) 489 (215, 709) 27.2 (12.2, 38.9)2 283 (108, 608) 488 (217, 740) 27.5 (12.2, 41.7)3 283 (112, 582) 517 (242, 775) 29.0 (13.2, 43.1)4 274 (105, 564) 512 (228, 793) 28.5 (12.3, 43.9)5 280 (109, 549) 527 (226, 797) 39.2 (12.4, 41.9)To investigate the economic consequences of these strategies, we assume each farmeris compensated for their culled livestock. We use additional data from the outbreakwhich describes the type of poultry on each farm (broiler, duck, turkey and layer) andthe number of birds on each farm. The value of the compensation depends on the typeof bird culled, the number of birds culled, their age in weeks, and, for turkeys, theirgender. We follow Backer et al. (2015) who use the approximate rates shown in Table6. We acknowledge this method is crude and does not take into account any of thewider economic impacts. However, it allows us to simulate the number of farms that areinfected, the number of farms that are culled, and the compensation paid to farmers.These three values can be used to compare the risk to public health, the impact of thepoultry industry, and the cost to the authorities.Table 7 shows the results of the culling strategies for radii between 0km and 5km. Aculling radius of 0km denotes the authorities taking no action. It is clear that taking anycourse of action is better than taking none, but we also see that more ambitious strategiesshow little gain in reducing the median number of farms infected in an outbreak. Theeﬀect of culling at larger radii results in a larger number of culled farms and a higheramount of compensation, but does not result in a considerable reduction in the numberof infected farms. This is because the maximum number of farms culled per day isquickly reached, even for small culling radii. In the data set, the average density offarms was approximately 2 per km , whereas a culling radius of 2km covers over 12km .These results are broadly in line with those in Backer et al. (2015), which also suggestthat larger culling radii do not result in a considerable reduction in the number of infectedfarms. However, as we use a much smaller estimate for the maximum number of farmsculled per day, we do not observe a large diﬀerence between culling radii of 1km and20km. We have presented an analysis of an outbreak of Avian Inﬂuenza in poultry farms inthe Netherlands using a Bayesian nonparametric approach. Our approach demonstratesthat it is possible to model the spatially heterogeneous infection rate for infectious dis-eases nonparametrically, and that GPs provide a ﬂexible framework for doing so. Thisnonparametric methodology allows us to reduce the need for strict parametric assump-tions, which are often made for mathematical or practical convenience and may havelittle scientiﬁc basis. Our methods also allow us to account for missing data, speciﬁcallythe unobserved process of infection, without making unrealistic simplifying assumptions.The methods we have described require more time and computational power than thestandard parametric methods, especially when employed in conjunction with an MCMCapproach to sample from the desired posterior distribution. We have however somewhatalleviated these issues by using a GP approximation method which appears to work wellin our applications.With regards to the Avian Inﬂuenza data set, our methodology has allowed us toapproach the infection process in a more ﬂexible way than previous methods. Ourestimates are in line with previous work, and combining this method with previouslydeveloped MCMC techniques and data augmentation allows us to analyse this data setin more detail than has previously been possible, including determining whether pre-emptively culled farms had been infected. The uncertainty around our estimates islarger than that of previous parametric methods, but since we do not assume speciﬁcparametric models then our methods are, in some sense, giving a fairer quantiﬁcationof uncertainty. We are able to use the posterior predictive distribution to analyse theeﬀect of diﬀerent control strategies which can be used to inform policy in this area.In this paper, we have focussed on spatial heterogeneity as the key determinant ofthe infection rate. In reality, it is possible that the number and type of animals onthe farms was also important. Given appropriate data, it is natural to build a modelwhich contains such data as covariates. One way of doing this would be to consider eachcovariate as a separate dimension of the GP. Another possible extension is to considerdiﬀerent covariance functions beyond the squared exponential function, which could beappropriate in some applications.

We thank Wageningen Bioveterinary Research, The Netherlands Food and ConsumerProduct Authority and the Dutch Ministry of Agriculture, Nature and Food Quality forsharing anonymised outbreak, culling and denominator data of the Dutch 2003 HPAIepidemic with us.We are grateful for access to the University of Nottingham High Performance Com-puting Facility. 21his work was supported by the UK Engineering and Physical Sciences ResearchCouncil (EPSRC) grant EP/N50970X/1.

References

Andersson, H. and Britton, T. (2000).

Stochastic epidemic models and their statisticalanalysis . Lecture Notes in Statistics. Springer.Backer, J., van Roermund, H., Fischer, E., van Asseldonk, M., and Bergevoet, R. (2015).Controlling highly pathogenic avian inﬂuenza outbreaks: An epidemiological and eco-nomic model analysis.

Preventive Veterinary Medicine , 121(1-2):142–150.Bailey, N. T. J. (1975).

The mathematical theory of infectious diseases and its applica-tions . Griﬃn, London, 2nd edition.Bataille, A., van der Meer, F., Stegeman, A., and Koch, G. (2011). Evolutionary analysisof inter-farm transmission dynamics in a highly pathogenic avian inﬂuenza epidemic.

PLoS Pathogens , 7(6):e1002094.Becker, N. G. (1989).

Analysis of infectious disease data . Chapman and Hall, London.Boender, G. J., Hagenaars, T. J., Bouma, A., Nodelijk, G., Elbers, A. R. W., de Jong,M. C. M., and van Boven, M. (2007). Risk maps for the spread of highly pathogenicavian inﬂuenza in poultry.

PLoS Computational Biology , 3(4):e71.Csato, L. and Opper, M. (2002). Sparse online Gaussian processes.

Neural Computation ,14(3):641–668.Directorate-General for Health and Consumers (2003). Avian inﬂuenza (AI) in theNetherlands, Belgium and Germany – chronology of main events and list of decisionsadopted by the commission. Technical report, European Commission.Elbers, A. R. W., Fabri, T. H. F., de Vries, T. S., de Wit, J. J., Pijpers, A., and Koch,G. (2004). The highly pathogenic avian inﬂuenza A (H7N7) virus epidemic in thenetherlands in 2003—lessons learned from the ﬁrst ﬁve outbreaks.

Avian Diseases ,48(3):691–705.Fouchier, R. A. M., Schneeberger, P. M., Rozendaal, F. W., Broekman, J. M., Kemink,S. A. G., Munster, V., Kuiken, T., Rimmelzwaan, G. F., Schutten, M., van Doornum,G. J. J., Koch, G., Bosman, A., Koopmans, M., and Osterhaus, A. D. M. E. (2004).Avian inﬂuenza A virus (H7N7) associated with human conjunctivitis and a fatalcase of acute respiratory distress syndrome.

Proceedings of the National Academy ofSciences , 101(5):1356–1361.Hensman, J., Fusi, N., and Lawrence, N. D. (2013). Gaussian processes for big data. In

Conference on Uncertainty in Artiﬁcial Intelligence , pages 282–290.22ewell, C. P., Kypraios, T., Neal, P., and Roberts, G. O. (2009). Bayesian analysis foremerging infectious diseases.

Bayesian Analysis , 4(3):465–496.Koopmans, M., Wilbrink, B., Conyn, M., Natrop, G., van der Nat, H., Vennema, H.,Meijer, A., van Steenbergen, J., Fouchier, R., Osterhaus, A., and Bosman, A. (2004).Transmission of H7N7 avian inﬂuenza A virus to human beings during a large outbreakin commercial poultry farms in the netherlands.

The Lancet , 363(9409):587–593.Neal, R. M. (1995). Suppressing random walks in markov chain monte carlo using orderedoverrelaxation. Technical report, Department of Statistics, University of Toronto.O’Neill, P. D. and Roberts, G. (1999). Bayesian inference for partially observed stochasticepidemics.

Journal of the Royal Statistical Society: Series A , 162(1):121–129.Quinonero-Candela, J. and Rasmussen, C. E. (2005). A unifying view of sparse approx-imate Gaussian process regression.

Journal of Machine Learning Research , 6.Rasmussen, C. E. and Williams, C. (2006).

Gaussian Processes for Machine Learning .MIT Press.Spiegelhalter, D. J., Best, N. G., Carlin, B. P., and Van Der Linde, A. (2002). Bayesianmeasures of model complexity and ﬁt.

Journal of the Royal Statistical Society: SeriesB , 64(4):583–639.Stegeman, A., Bouma, A., Elbers, A. R. W., de Jong, M. C. M., Nodelijk, G., de Klerk,F., Koch, G., and van Boven, M. (2004). Avian inﬂuenza A virus (H7N7) epidemic inthe netherlands in 2003: Course of the epidemic and eﬀectiveness of control measures.

The Journal of Infectious Diseases , 190(12):2088–2095.van der Goot, J. A., Koch, G., de Jong, M. C. M., and van Boven, M. (2005). Quantiﬁca-tion of the eﬀect of vaccination on transmission of avian inﬂuenza (H7N7) in chickens.