[PDF] Bayesian Tracking of Emerging Epidemics Using Ensemble Optimal Statistical Interpolation (EnOSI)

Abstract

We explore the use of the optimal statistical interpolation (OSI) data assimilation method for the statistical tracking of emerging epidemics and to study the spatial dynamics of a disease. The epidemic models that we used for this study are spatial variants of the common susceptible-infectious-removed (S-I-R) compartmental model of epidemiology. The spatial S-I-R epidemic model is illustrated by application to simulated spatial dynamic epidemic data from the historic "Black Death" plague of 14th century Europe. Bayesian statistical tracking of emerging epidemic diseases using the OSI as it unfolds is illustrated for a simulated epidemic wave originating in Santa Fe, New Mexico.

Full PDF

BBayesian Tracking of Emerging Epidemics Using EnsembleOptimal Statistical Interpolation (EnOSI)

Ashok Krishnamurthy ∗ Loren Cobb ∗ Jan Mandel ∗ Jonathan Beezley ∗ Abstract

We explore the use of the optimal statistical interpolation (OSI) data assimilation methodfor the statistical tracking of emerging epidemics and to study the spatial dynamics ofa disease. The epidemic models that we used for this study are spatial variants of thecommon susceptible-infectious-removed (S-I-R) compartmental model of epidemiology. Thespatial S-I-R epidemic model is illustrated by application to simulated spatial dynamicepidemic data from the historic ”Black Death” plague of 14th century Europe. Bayesianstatistical tracking of emerging epidemic diseases using the OSI as it unfolds is illustratedfor a simulated epidemic wave originating in Santa Fe, New Mexico.

Key Words:

Bayesian statistical tracking, emerging epidemics, spatial S-I-R epidemicmodel, data assimilation, ensemble Kalman ﬁlter, optimal statistical interpolation

1. Introduction

Mathematical models have been used since 1927 to describe the rise and fall of in-fectious disease epidemics (Diekmann and Heesterbeek, 2000; Castillo-Ch´avez andBlower, 2002a,b; Ma and Xia, 2009). A majority of the models are based on thethree-compartment nonlinear Susceptible-Infected-Removed (S-I-R) model devel-oped by Kermack and McKendrick (1927). A person occupies the susceptible or infectious compartments if he or she can contract or transmit the disease, respec-tively. The removed compartment includes those who have died, have been quaran-tined, or have recovered from the disease and become immune. The state variablesare the number of susceptible ( S ), the infectious ( I ), and the removed ( R ) in aclosed population. S-I-R models often perform surprisingly well in modeling thetemporal evolution of real-world epidemics, and their spatial variants can produce atraveling-wave spatial dynamics similar to that displayed by some epidemics. Trav-eling waves are solutions to the spatial models such that the distribution of infectedat time ( t + 1) is approximately a translation of the distribution at time t .Tracking and forecasting the full spatio-temporal evolution of a new epidemicis notoriously diﬃcult. Often the model itself is incorrect in unknown ways, obser-vational data may be aﬀected by many sources of error, and new data arrives onan irregular schedule. This is a well-known problem in a variety of empirical areasof high importance, such as storm and wildﬁre forecasting. The general categoryof tracking techniques that incorporate error-prone data as they arrive by sequen-tial statistical estimation is known as data assimilation (Kalnay, 2003). Use of thestatistical methods of data assimilation can increase the accuracy and reliability ofepidemic tracking by incorporating data as it arrives, with weighting factors that ∗ [email protected] a r X i v : . [ s t a t . C O ] S e p eﬂect the observed reliability of the observations. A few applications of data assim-ilation in epidemiology already exist (Kalivianakis et al., 1994; Cazelles and Chau,1997; Costa et al., 2005; Bettencourt et al., 2007; Bettencourt and Ribeiro, 2008;J´egat et al., 2008; Rhodes and Hollingsworth, 2009; Dukic et al., 2009; Mandel et al.,2010a).The aim of this paper is to study the use of a spatial variant of the S-I-Rmodel to track newly emerging epidemics using a data assimilation technique calledoptimal statistical interpolation (OSI). This is already a popular data assimilationmethod in the literature of meteorology and oceanography. When coupled with aspatial dynamic model, the OSI method can be used to forecast the spatio-temporalevolution of an epidemic, and to adjust those forecasts appropriately as sparse anderror-prone data arrives.This paper is organized as follows. In Section 2, we present a stochastic spatialepidemic model and use it to reproduce the spatio-temporal disease spread map ofthe 14th century Black Death. In Section 3, we illustrate the Bayesian tracking ofemerging epidemics using OSI with a simulated epidemic wave originating in SantaFe, New Mexico. In Section 4, we provide computational results for the stochas-tic spatial epidemic model with epidemic tracking method presented in Section 3.Finally, in Section 5, we provide some concluding remarks and future directions.

2. A Stochastic Spatial Epidemic Model2.1 Epidemic Dynamics

For this study we use a discretized stochastic version of the Hoppensteadt (1975)spatial S-I-R epidemic model. As with almost all spatial epidemic models sinceBailey (1957, 1967); Kendall (1965), we assume that individuals are continuouslydistributed on a spatial domain. This model uses three variables to deﬁne the stateof the epidemic at each ( x, y ) coordinate: S ( x, y, t ) = density (per unit area) of the susceptible population, I ( x, y, t ) = density of the infected population, and R ( x, y, t ) = density of the removed population.Thus, each of these variables is a scalar ﬁeld that evolves with time.In continuous time the epidemic dynamics are deﬁned by a system of three partialdiﬀerential equations for the state variables. There are no vital dynamics in thismodel, meaning that there are no new births or non-disease related deaths in any ofthe three compartments. Following Hoppensteadt (1975), we assume that the rateof new infections at location ( x, y ) depends on the density of infection at that point,and in nearby locations that have been weighted with a kernel function that dropsoﬀ exponentially with Euclidean distance. The eﬀective (i.e. weighted) density ofinfection at ( x, y ) is given by J ( x, y, t ) = (cid:90) (cid:90) I ( x − φ, y − θ ) K ( φ, θ ) dφdθ,K ( φ, θ ) ∝ exp (cid:16) − α (cid:112) φ + θ (cid:17) , here the integral is taken over the entire surface area under study, and the pro-portionality constant is given by the condition that (cid:82) (cid:82) K ( φ, θ ) dφdθ = 1. Then,the three partial diﬀerential equations for the deterministic evolution of the spatialepidemic are: ∂ t S = − βSJ,∂ t I = βSJ − γI,∂ t R = γI. In this model, β is the rate of infection from infected to susceptibles, givenhomogeneous mixing, α is an intensity measure of infectiousness of the disease, givenby the product of mixing rate and the infection rate and γ is the rate of removal ofinfected persons through death, recovery with immunity, and quarantine. To makethis model stochastic, we assume that the quantities βSJ and γI are the intensitiesof two independent Poisson processes. For simulations in discrete time and space, weuse the following approximation: The number of newly infected and newly removedpersons over the time interval ( t, t + ∆ t ), within a box centered at position ( x, y )with ∆ x ∆ y units of area, are given byNumber newly infected ∼ P oisson ( βS ( x, y, t ) J ( x, y, t )∆ x ∆ y ∆ t ),Number newly removed ∼ P oisson ( γI ( x, y, t )∆ x ∆ y ∆ t ) . Thus a susceptible individual, at a particular location, may become infectedwhen he/she comes in contact with an infected individual from within a neighboringarea, with a monotonically decreasing weighting function that declines exponentiallywith distance. If this contact causes suﬃcient secondary infections then a newepidemic focus will develop at that new location. The simulation evolves on a two-dimensional discretized spatial domain with a total of n × m grid cells. The ”Black Death” bubonic plague epidemic that hit Europe in 1347 killed some-where between 30% and 60% of the population of Europe over the course of aboutfour years. The virulence of the disease back then was severe, which explains theextremely high number of deaths. Plague recurred in various regions of Europe foranother 300 years before gradually withdrawing from Europe. There have been sev-eral attempts to reconstruct the movement of the wavefront of the epidemic (Langer,1964; Noble, 1974; Christakos and Olea, 2005; Christakos et al., 2007; Gaudart et al.,2010), as it swept across the continent; one such attempt using our spatial S-I-Rmodel is shown in Figure 1b; others are similar. The disease arrived from Asiainto Europe by trading ships, appearing ﬁrst in Constantinople (modern Istanbul,Turkey) in 1347. From there it was carried by ship to Italy, France, Spain, andCroatia. Once ashore it moved inland (mainly in pneumonic form) at a speed thathas been estimated at between 100 and 400 miles per year.If it is to have credibility, a spatial epidemic model should be able to reproducethe principal historical features of the Black Death: its movement into the interior ofEurope from the coastal cities, especially Marseilles, and movement up the island of igure 1 : Figure 1a (left) Historical reconstruction. (Reproduced from Qedoc.org(2008) under Creative Commons license.) 1b (right) Simulated state of the epidemicin 1350.Britain after its arrival in Bristol and London. Figure 1b shows the population den-sity of infected people, using modern population densities in place of the unknownmedieval population pattern, at roughly the beginning of 1350.The R statistical computing language (R Development Core Team, 2010), freelyavailable from , was used to carry out the simulationsfor the spatial spread of the epidemic. Modern day population density data weredownloaded as GPW (Gridded Population of the World) data ﬁles from the Centerfor International Earth Science Information Network at the Columbia University andconverted to the array-oriented Network Common Data Form (NetCDF) format.These datasets were then loaded into R using the built-in package ncdf . In historical reality, the zone from Romania to Poland to Russia suﬀered only lightlyfrom the ﬁrst wave of the Black Death. It is quite plain that the S-I-R simulationdoes not conform to the historical reconstruction in that zone. The explanationmay lie with the very low population density and geographic mobility in EasternEurope in the 14th century. A smaller discrepancy occurs in the city of Milan, Italy,which brutally stopped the spread of the plague by boarding up infected people intheir homes. Despite diﬀerences in population distribution (France was then muchmore populous than Germany, for example), we believe that the simulation performsreasonably well in all areas except Eastern Europe.

3. Statistical Tracking Using Data Assimilation Techniques

We use data assimilation for the statistical tracking of emerging epidemics as theyare unfolding. This involves two basic components: a dynamic model to forecast thestate of the epidemic between arrivals of new data, and observations that are usedo update an ensemble of state estimates. Data assimilation requires estimating theuncertainty both for model and observations forecasts. Our goal in this paper isto incorporate sparse and noisy observational epidemic data over space and timeinto a dynamic statistical model so as to produce an optimal Bayesian estimate ofthe current state of the infected population, and to forecast the progress of the realepidemic.

The Kalman Filter (KF) was ﬁrst presented by Kalman (1960) and Kalman andBucy (1961) as a method for tracking the state of a linear dynamic system perturbedby Gaussian white noise. In mathematical terms, this means that the errors aredrawn from a zero-mean distribution with diagonal covariance matrix.The full state of a discretized spatial epidemic model is a grid of n × m cells, eachof which contains a characterization of the population currently within the limits ofthe cell. To apply the Kalman ﬁltering method, we represent the n × m values of the Infected variable on this grid as a single long vector x with p = n × m elements, whichfor the purposes of data assimilation is the dimensionality of the state space. If wecould observe this state vector without error, our observations would be anothervector y that satisﬁes y = Hx . Now consider the situation in which we have aforecast of the current state, x f , and a newly arrived vector of noisy observations y = Hx + ε , where ε ∼ N (0 , R ). We need to update the forecast by optimallyassimilating these new observations. The result will be called the analysis estimateof state, x a . The superscripts f and a are used to denote the forecast (prior) andanalysis (posterior) estimate of the current state, respectively.In the classical Kalman ﬁlter, the underlying dynamics are assumed to be linear,e.g. x t = F t − x t − + u t − ,u t ∼ N (0 , Σ) , and the analysis estimate of the state vector is calculated from x ft = F t − x ft − x at = x ft + K t ( y t − Hx ft ) ,Q at = ( I − K t H ) Q ft , where K t = Q ft H T ( HQ ft H T + R ) − , and Q ft = F t − Q ft − F (cid:48) t − + Σ . Here K t is the Kalman gain matrix at time t , and Q ft is the covariance matrix forthe forecast state vector.The extended Kalman ﬁlter (EKF) (Julier et al., 1995) was an early attempt toadapt the basic KF equations for nonlinear problems, through linearization. How-ever, the EKF has it’s own disadvantages: if the model is strongly nonlinear atthe time step of interest, linearization errors can turn out to be non-negligible,hich leads to ﬁlter divergence (Evensen, 1992). The EKF is not suitable forhigh-dimensional 2D and 3D data assimilation problems. Other commonly usedBayesian tracking techniques for nonlinear problems include the unscented Kalmanﬁlter (UKF) and the particle ﬁlter (PF) (Gordon et al., 1993). Particle ﬁltering isa versatile Monte Carlo technique that can handle nonlinearities in the model andrepresents the Bayesian posterior probability density function by a set of samplesdrawn at random with associated weights. The KF algorithm described above is based on assimilating only one initial stateignoring the uncertainties in the model. In the following, we account for this state-dependent uncertainty by taking an ensemble approach to data assimilation. Tosee the problem, consider a 2D simulation of a scalar ﬁeld that has been discretizedon a 10 × grid. The state vector of this system has 10 elements, and itscovariance matrix has 10 × = 10 elements, requiring eight terabytes just tostore. The EnKF solves this storage-and-retrieval problem by (in eﬀect) calculatingthe covariances from the members of the ensemble as they are needed. The result isin an elegant Bayesian update algorithm with dramatically improved eﬃciency andstorage requirements Mandel et al. (2010a).The ensemble Kalman ﬁlter (EnKF) was introduced by Evensen (1994), modiﬁedto provide correct covariance by Burgers et al. (1998), and improved by Houtekamerand Mitchell (1998). The EnKF is a popular sequential Bayesian data assimilationtechnique that uses a collection of almost-independent simulations (known as anensemble) to solve the covariance problem of Kalman ﬁltering for systems with veryhigh-dimensional state vectors. It does this using a two-step process: estimate ofthe covariance matrix, followed by an ensemble update. The covariance of a singlestate estimate in the KF is replaced by the sample covariance computed from theensemble members. This sample covariance of ensemble forecasts is then used tocalculate the Kalman gain matrix. There are two basic approaches to the EnKFupdate: stochastically perturbed observations (Monte Carlo), and “square-root”ﬁlters (deterministic). Both approaches adopt the same covariance estimate step,but diﬀer in the ensemble update step. Regardless of the speciﬁc approach employed,the goal is to obtain a Bayesian estimate of the state as eﬃciently as possible. Inmany real-world examples these two approaches perform quite similarly. A moredetailed description EnKF may be found in the book by Evensen (2009).The EnKF analysis update equations are analogous to the classical KF equations,except that they use the covariance of the forecast ensemble to substitute for thematrix Q , which in a high-dimensional system is too large to store. Let X be arandom ensemble matrix of dimension p × N whose columns are realizations sampledfrom the prior distribution of the system state of dimension p with ensemble size N .Then the EnKF update formula is: X a = X f + K e ( Y − HX f ) , where Y is the observed ensemble data matrix whose columns are the true stateperturbed by random Gaussian error. H is, as before, the linear operator that mapsthe state vector onto the observational space. The deviation Y − HX f is commonlyeferred to as the “observed-minus-forecast residual” or simply as the innovation.In the above equation K e is the ensemble Kalman gain matrix given by K e = Q f H T ( HQ f H T + R ) − , where Q f is the forecast-error covariance matrix of dimension p × p , and R is thesymmetric and positive-deﬁnite observational (measurement) error covariance ma-trix. The EnKF technique contains two sources of randomness: the random modelinput, and the measurement errors. Assuming that these two sources of randomnessare uncorrelated, the analysis-error covariance matrix of dimension p × N can becomputed from the equation Q a = ( I − K e H ) Q f . In the EnKF, the model error covariance matrix is evolved fully at each data assim-ilation step using an MCMC method. In contrast, Optimal Statistical Interpolation(OSI) is a data assimilation technique based on statistical estimation theory in whichthe model error covariance matrix is pre-determined empirically and is assumed tobe time-invariant. The model error covariance matrix is dependent only on thedistance between spatial grid cells. The correlation length is ad hoc and adjustedempirically.OSI was derived by Eliassen (1954). This method has been referred to as “Sta-tistical Interpolation”, “Optimal Interpolation”, or “Objective Analysis.” OSI iscalled univariate if the observations are of a single scalar ﬁeld, and multivariate ifthe observations of one or more scalar ﬁelds are used for estimating another scalarﬁeld (Talagrand, 2003). Multivariate OSI was developed independently by Gandin(1965) for the analysis of meteorological ﬁelds in the former Soviet Union. It re-quires the speciﬁcation of the cross-covariance matrix between the observed scalarﬁelds and the scalar ﬁeld to be estimated.The ensemble OSI (EnOSI), used here, requires much less computational eﬀortthan the EnKF, because the model error covariance matrix is ﬁxed. The EnOSIapproach may provide a practical and cost-eﬀective alternative to the EnKF fortracking epidemics. The stationary model error covariance matrix in our epidemicsimulation used a version with the correlation function having an exponential de-cay along the oﬀ-diagonal entries. The ensemble Kalman gain matrix was thencalculated using this time-invariant covariance matrix with a ﬁxed structure. Theaccuracy of the EnOSI process will be aﬀected if the approximate covariance matrixdiﬀers substantially from the true covariance matrix. Therefore, one disadvantageof the OSI is the need for a ﬁxed spatial covariance structure that can reasonablyrepresent the epidemic dynamics throughout the whole domain at all times.The EnOSI analysis update equation, using the stationary covariance, is givenby X a = X f + K OSI ( Y − HX f ) (1)where K OSI = Q fOSI H T ( HQ fOSI H T + R ) − . .4 Example: An Epidemic Wave Originating In New Mexico To test the performance of EnOSI and other tracking algorithms designed for high-dimensional state vectors, we constructed a spatial simulation of an epidemic thatoriginates in Santa Fe, New Mexico, and spreads outwards towards Albuquerqueand Denver, Colorado. In this simulation the epidemic moves smoothly towardsAlbuquerque, but jumps suddenly to to Denver as if carried by a traveler in anautomobile or airplane. Properly detecting and assimilating a new feature far froman existing focus is a serious challenge for EnKF algorithms (Beezley and Mandel,2008; Mandel et al., 2010a).To improve the realism of the test for the case in which an entirely new diseaseemerges for the ﬁrst time, we initialized all members of the tracking ensemble sothat they contain no disease whatsoever. New data in the form of an empiricalscalar ﬁeld arrives at time steps 10, 20, 30, 40, and 50. These data are complete inthe sense that in this case H is just the identity matrix, but they are substantiallymodiﬁed by independent Poisson-distributed random errors. The tracking algorithmforecasts the state up until the time when data is received, and then it assimilatesthis data into the forecast.

4. Results

We have applied the EnOSI for the New Mexico example mentioned above to theepidemic model described in Section 2 with an ensemble of size 25. For this examplethe

Infected state of the model is the output of the observation function. Syntheticdata were simulated from a model and initialized in exactly the same way as theensemble.In our epidemics application, the perturbed observations Y in (1) were obtainedby sampling from the Poisson distribution with the intensity equal to the data valueand rounding to integer, consistently with the stochastic character of the model,instead of Gaussian perturbations as in the EnKF. This is the key feature behindthe successful use of our method and it also guarantees that Y has nonnegativeentries and thus the columns of Y are meaningful as the Infected variable. Ingeneral, one may have to guarantee that the entries of the members of the analysisensemble X a are also nonnegative, e.g. by censoring. This, however, was not neededin the results reported here.The result for each member of the ensemble advanced in time by 10 modeltime units is a Bayesian update of the forecast scalar ﬁeld, which is referred to asthe analysis (i.e. the posterior estimate). We assume that data arrive only onceevery 10 time steps, with errors. A total of 5 assimilation cycles were performed inthis manner. The mean and standard deviation (not reported here) of the ensembleanalysis values in each cell of the scalar ﬁeld gives the EnOSI estimate of the state ofthe epidemic, with its uncertainty quantiﬁed. The following ﬁgures present a spatial“image” of the number of infected persons over the planar domain considered in theNew Mexico example.Figure 2a shows the epidemic position in Santa Fe, New Mexico, at time 10.The initial forecast (2b) is empty, as it should be for the ﬁrst appearance of anynewly emergent disease. The EnOSI analysis (2c) shows that the arriving datahave been partially assimilated, with a resulting picture that is indistinct and less igure 2 : The epidemic at time 10. (a): The true state of the epidemic.(b): The forecast state of the epidemic prior to arrival of the ﬁrst data. (c): The analysis state of the epidemic, after data assimilation. Figure 3 : The epidemic at time 20. A new focus of infection has appeared inDenver, and is successfully reﬂected in the analysis state. igure 4 : The epidemic at time 30. The infected zone is expanding rapidly.

Figure 5 : The epidemic at time 50. Both the forecast and analysis estimates ofstate now accurately reﬂect the true state of the epidemic.han fully accurate. Figure 3a (time 20) shows the growth of the epidemic towardsAlbuquerque, and a very small new focus of infection in Denver. The forecast handlesthe movement towards Albuquerque quite well, but is devoid of any infection inDenver. After assimilation of the data, the analysis now also reﬂects a small focusin Denver. Figure 4 (time 30) shows the epidemic gaining size, and beginning amajor expansion within both Albuquerque and Denver. Figure 5 (time 50) showsthe epidemic gaining deﬁnition within the most heavily populated urban regions.The analysis steps, after data assimilation, are now tracking the epidemic quite well.

5. Conclusion

The spread of newly emerging infectious diseases pose a serious challenge to publichealth in every country of the world. Tracking the spread of an epidemic in real-timecan help public health oﬃcials to plan their emergency response and health care.The purpose of this paper has been to present the some preliminary results on thestatistical tracking of emerging epidemics of infectious diseases using a Bayesian dataassimilation technique called ensemble optimal statistical interpolation (EnOSI).Our simulation results conﬁrms that EnOSI can be used to track the spatio-temporalpatterns of emerging epidemics. We found that EnOSI can eﬃciently adjust itsestimated spatial distribution of the number of infected, if and when the epidemicjumps from city to city, and with data that are sparse and error-ridden. The trackingaccuracy in our simulations provides evidence of the good performance of the EnOSIapproach, therefore the assumption that the model error covariance is time-invariantis reasonable. However, the eﬀect of this assumption needs more rigorous theoreticaljustiﬁcation.The EnOSI as presented here requires the manipulation of the state covariancematrix, which gets very demanding if stored as a full matrix - computational grid of200 by 200 points results in 40,000 variables and thus 40,000 by 40,000 covariancematrix, which requires a supercomputing cluster. Sparsiﬁcation of the covariancematrix can decrease the computational cost somewhat, but it is still signiﬁcantand the implementation grows complicated. For this reason, we plan to investigatea version of OSI by Mandel (2010) with the covariance implemented by the FastFourier Transform (FFT), similarly as in the FFT EnKF Mandel et al. (2010a,b,c).Our research has set the groundwork for further eﬀorts to incorporate the ideasof data assimilation to track diseases in real-time. Our future work includes ex-tending the R framework that we have developed for employing an ensemble ofspatial simulations to track diseases to test and compare a other variants of theEnKF (Anderson, 2001; Tippett et al., 2003; Beezley and Mandel, 2008; Mandelet al., 2009a; Ott et al., 2004; Hunt et al., 2007). These variants aim to enhancethe performance of the ensemble ﬁlters by representing the underlying model errorstatistics in an eﬃcient manner. However, since the ensemble size required can belarge (easily hundreds) Evensen (2009) for the approximation to converge (Mandelet al., 2009b), the amount of computations in the ensemble-based methods can besigniﬁcant, and so special localization techniques, such as tapering, need to be em-ployed to suppress spurious long-range correlations in the ensemble covariance ma-trix (Furrer and Bengtsson, 2007). Thus the EnKF (and its variants) may not workwell for problems with sharp coherent features, such as the traveling waves found insome epidemics. Choosing a small ensemble size, so small that it is not statisticallyepresentative of the state of a system, leads to underestimation of the analysis er-ror covariances. Choosing a really large ensemble size may not be computationallyfeasible and cost eﬃcient. Finally, methods for incorporating long-distance humanmovements to track the rapid geographical spread of infectious diseases have beenproposed in the literature (Brockmann, 2009; Belik et al., 2009; Merler and Ajelli,2010; Balcan et al., 2010; Belik et al., 2010). In the future, we plan to explore suchspatially extended epidemic models to track emerging epidemics.

References

Anderson, J. L. (2001), “An ensemble adjustment Kalman ﬁlter for data assimila-tion,”

Monthly Weather Review , 129, 2884–2903.Bailey, N. T. J. (1957),

Mathematical Theory of Epidemics , Griﬃn.— (1967), “The simulation of stochastic epidemics in two dimensions,” in

Proceed-ings of the ﬁfth Berkeley Symposium on mathematical statistics and probability:biology and problems of health , vol. 4, pp. 237–257.Balcan, D., Gon¸calves, B., Hu, H., Ramasco, J. J., Colizza, V., and Vespignani, A.(2010), “Modeling the spatial spread of infectious diseases: The Global Epidemicand Mobility computational model,”

Journal of Computational Science .Beezley, J. D. and Mandel, J. (2008), “Morphing ensemble Kalman ﬁlters,”

TellusA , 60, 131–140.Belik, V., Geisel, T., and Brockmann, D. (2009), “The impact of human mobilityon spatial disease dynamics,” in

Proceedings of the 2009 International Conferenceon Computational Science and Engineering-Volume 04 , IEEE Computer Society,pp. 932–935.— (2010), “Human movements and the spread of infectious diseases,”

NetMob 2010:Workshop on the Analysis of Mobile Phone Networks , 44–48.Bettencourt, L. M. A. and Ribeiro, R. M. (2008), “Real time bayesian estimationof the epidemic potential of emerging infectious diseases,”

PLoS One , 3, 1–9.Bettencourt, L. M. A., Ribeiro, R. M., Chowell, G., Lant, T., and Castillo-Chavez,C. (2007), “Towards real time epidemiology: data assimilation, modeling andanomaly detection of health surveillance data streams,” in

Intelligence and Secu-rity Informatics: Biosurveillance , Springer, vol. 4506 of

Lecture Notes in Com-puter Science , pp. 79–90.Brockmann, D. (2009),

Human mobility and spatial disease dynamics , Wiley-VCH,pp. 1–24.Burgers, G., van Leeuwen, P. J., and Evensen, G. (1998), “Analysis scheme in theensemble Kalman ﬁlter,”

Monthly Weather Review , 126, 1719–1724.Castillo-Ch´avez, C. and Blower, S. (2002a),

Mathematical Approaches for Emergingand Reemerging Infectious Diseases: An Introduction , Springer Verlag. (2002b),

Mathematical Approaches for Emerging and Reemerging Infectious Dis-eases: Models, Methods and Theory , Springer Verlag.Cazelles, B. and Chau, N. P. (1997), “Using the Kalman ﬁlter and dynamic modelsto assess the changing HIV/AIDS epidemic,”

Mathematical Biosciences , 140, 131–154.Christakos, G. and Olea, R. A. (2005), “New space-time perspectives on the prop-agation characteristics of the Black Death epidemic and its relation to bubonicplague,”

Stochastic Environmental Research and Risk Assessment , 19, 307–314.Christakos, G., Olea, R. A., and Yu, H. L. (2007), “Recent results on the spatiotem-poral modelling and comparative analysis of Black Death and bubonic plagueepidemics,”

Public Health , 121, 700–720.Costa, P. J., Dunyak, J. P., and Mohtashemi, M. (2005), “Models, prediction, andestimation of outbreaks of infectious disease,”

IEEE SoutheastCon, 2005. Pro-ceedings , 174–178.Diekmann, O. and Heesterbeek, J. (2000),

Mathematical epidemiology of infectiousdiseases. Model building, analysis and interpretation , Wiley Series in Mathemat-ical and Computational Biology.Dukic, V. M., Lopes, H. F., and Polson, N. (2009), “Tracking Flu Epidemics UsingGoogle Flu Trends and Particle Learning,”

SSRN eLibrary , http://ssrn.com/paper=1513705 .Eliassen, A. (1954), “Provisional Report on Calculation of Spatial Covariance andAutocorrelation of the Pressure Field.” Rept. No. 5, Institute of Weather andClimate Research, Academy of Sciences, Oslo, reprinted in Dynamic Meteorology:Data Assimilation Methods, L. Bengtsson, M. Ghil, and E. Kallen., Ed., Springer-Verlag, 319–330, 1981.Evensen, G. (1992), “Using the Extended Kalman Filter With a Multilayer Quasi-Geostrophic Ocean Model,” J. Geophys. Res , 97, 17905–17924.— (1994), “Sequential data assimilation with a nonlinear quasi-geostrophic modelusing Monte Carlo methods to forecast error statistics,”

J. Geophys. Res , 99,10143–10162.— (2009),

Data assimilation: The ensemble Kalman ﬁlter , Springer Verlag.Furrer, R. and Bengtsson, T. (2007), “Estimation of high-dimensional prior andposterior covariance matrices in Kalman ﬁlter variants,”

Journal of MultivariateAnalysis , 98, 227 – 255.Gandin, L. S. (1965),

Objective analysis of meteorological ﬁelds , English translationfrom the Russian by Israel Program for Scientiﬁc Translations.Gaudart, J., Ghassani, M., Mintsa, J., Waku, J., Rachdi, M., Doumbo, O. K.,and Demongeot, J. (2010), “Demographic and Spatial Factors as Causes of anEpidemic Spread, the Copule Approach: Application to the Retro-prediction ofhe Black Death Epidemy of 1346,” in , IEEE, pp.751–758.Gordon, N. J., Salmond, D. J., and Smith, A. F. M. (1993), “Novel approach tononlinear/non-Gaussian Bayesian state estimation,” in

IEE Proceedings , vol. 140,pp. 107–113.Hoppensteadt, F. (1975),

Mathematical Theories of Populations, Demographics,and Epidemics , CBMS-NSF Regional Conference Series in Applied Mathemat-ics, SIAM.Houtekamer, P. L. and Mitchell, H. L. (1998), “Data Assimilation Using an EnsembleKalman Filter Technique,”

Monthly Weather Review , 126, 796–811.Hunt, B. R., Kostelich, E. J., and Szunyogh, I. (2007), “Eﬃcient data assimilationfor spatiotemporal chaos: A local ensemble transform Kalman ﬁlter,”

Physica D:Nonlinear Phenomena , 230, 112–126.J´egat, C., Carrat, F., Lajaunie, C., and Wackernagel, H. (2008), “Early Detectionand Assessment of Epidemics by Particle Filtering,” in

GeoENV VI-Geostatisticsfor Environmental Applications: Proceedings of the Sixth European Conference onGeostatistics for Environmental Applications , Springer Verlag, pp. 23–35.Julier, S. J., Uhlmann, J. K., and Durrant-Whyte, H. F. (1995), “A new approachfor ﬁltering nonlinear systems,” in

Proc. American Control Conf , vol. 3, pp. 1628–1632.Kalivianakis, M., Mous, S. L. J., and Grasman, J. (1994), “Reconstruction of theseasonally varying contact rate for measles,”

Mathematical biosciences , 124, 225–234.Kalman, R. and Bucy, R. (1961), “New results in linear prediction and ﬁlteringtheory,”

Trans. AMSE J. Basic Eng , 95–108.Kalman, R. E. (1960), “A new approach to linear ﬁltering and prediction problems,”

Journal of basic Engineering , 82, 35–45.Kalnay, E. (2003),

Atmospheric modeling, data assimilation, and predictability ,Cambridge Univ Pr.Kendall, D. G. (1965), “Mathematical models of the spread of infection,”

Mathe-matics and Computer Science in Biology and Medicine , 213.Kermack, W. O. and McKendrick, A. G. (1927), “A Contribution to the Mathemat-ical Theory of Epidemics,”

Royal Society of London Proceedings Series A , 115,700–721.Langer, W. L. (1964), “The Black Death,”

Scientiﬁc American , 210, 114.Ma, S. and Xia, Y. (2009),

Mathematical Understanding of Infectious Disease Dy-namics , World Scientiﬁc.andel, J. (2010), “Osimorph,” http://ccm.ucdenver.edu/wiki/Osimorph .Mandel, J., Beezley, J. D., Cobb, L., and Krishnamurthy, A. (2010a), “Data drivencomputing by the morphing fast Fourier transform ensemble Kalman ﬁlter inepidemic spread simulations,”

Procedia Computer Science , 1, 1215 – 1223, ICCS2010.Mandel, J., Beezley, J. D., Coen, J. L., and Kim, M. (2009a), “Data Assimilation forWildland Fires: Ensemble Kalman ﬁlters in coupled atmosphere-surface models,”

IEEE control systems , 29, 47–65.Mandel, J., Beezley, J. D., Eben, K., Jurus, P., Kondratenko, V. Y., and Resler, J.(2010b), “Data assimilation by morphing fast Fourier transform ensemble Kalmanﬁlter for precipitation forecasts using radar images,” CCM UCD Report 290, http://ccm.ucdenver.edu/reports .Mandel, J., Beezley, J. D., and Kondratenko, V. Y. (2010c), “Fast Fourier TransformEnsemble Kalman Filter with Application to a Coupled Atmosphere-WildlandFire Model,” in

Computational Intelligence in Business and Economics, Proceed-ings of MS10 , ed. A. M. Gil-Lafuente, J. M. M., World Scientiﬁc, pp. 777–784,also available as preprint arXiv:1001.1588.Mandel, J., Cobb, L., and Beezley, J. (2009b), “On the convergence of the en-semble Kalman ﬁlter,” Applications of Mathematics, in print, also available asarXiv:0901.2951.Merler, S. and Ajelli, M. (2010), “The role of population heterogeneity and humanmobility in the spread of pandemic inﬂuenza,”

Proceedings of the Royal SocietyB: Biological Sciences , 277, 557–565.Noble, J. V. (1974), “Geographic and temporal development of plagues,”

Nature ,250, 726–729.Ott, E., Hunt, B., Szunyogh, I., Zimin, A., Kostelich, E., Corazza, M., Kalnay, E.,Patil, D., and Yorke, J. (2004), “A local ensemble Kalman ﬁlter for atmosphericdata assimilation,”

Tellus A , 56, 415–428.Qedoc.org (2008), “Diﬀusion de la peste noire en europe 1347-1351,” , visited September 2010.R Development Core Team (2010),

R: A Language and Environment for StatisticalComputing , R Foundation for Statistical Computing, Vienna, Austria.Rhodes, C. J. and Hollingsworth, T. D. (2009), “Variational data assimilation withepidemic models,”

Journal of Theoretical Biology , 258, 591–602.Talagrand, O. (2003), “Bayesian estimation. Optimal interpolation. Statistical lin-ear estimation,” in

Data assimilation for the Earth system , eds. Swinbank, R.,Shutaev, V. P., and Lahoz, W. A., Springer Netherlands, pp. 21–35.Tippett, M. K., Anderson, J. L., Bishop, C. H., Hamill, T. M., and Whitaker, J. S.(2003), “Ensemble Square Root Filters,”