[PDF] A stochastic geospatial epidemic model and simulation using an event modulated Gillespie algorithm

Abstract

We developed a model and a software package for stochastic simulations of transmission of COVID-19 and other similar infectious diseases, that takes into account contact network structures and geographical distribution of population density, detailed up to a level of location of individuals. Our analysis framework includes a surrogate model optimization process for quick fitting of the model's parameters to the observed epidemic curves for cases, hospitalizations and deaths. This set of instruments (the model, the simulation code, and the optimizer) is a useful tool for policymakers and epidemic response teams who can use it to forecast epidemic development scenarios in local environments (on the scale from towns to large countries) and design optimal response strategies. The simulation code also includes a geospatial visualization subsystem, presenting detailed views of epidemic scenarios directly on population density maps. We used the developed framework to draw predictions for COVID-19 spreading in the canton of Geneva, Switzerland.

Full PDF

AA stochastic geospatial epidemic model andsimulation using an event modulated Gillespiealgorithm

Alexander Temerev, Liudmila Rozanova

Institute of Global Health, University of Geneva

ABSTRACT

We developed a model and an instrument for stochastic simulations of spreading of COVID-19and other similar infectious diseases, that takes into account both contact network structures andgeographical distribution of population density, detailed up to a level of location of individuals.Our analysis framework includes the surrogate model (SuMo) optimization process for quickﬁtting of the model’s parameters to the observed epidemic curves for cases, hospitalizations anddeaths. This set of instruments (the model, the simulation code, and the optimizer) can be auseful tool for policymakers and epidemic response teams who can use it to forecast epidemicdevelopment scenarios in local environments (on the scale from towns to large countries) anddesign optimal response strategies. The simulation code also includes a geospatial visualizationsubsystem, presenting detailed views of epidemic scenarios directly on population density maps.We used the developed framework to draw predictions for COVID-19 spreading in Switzerland,on the level of individual cantons; their diﬀerence in population density distribution accounts tosigniﬁcant variety in epidemic curves and consequently the choice of optimal response strategy.

KEYWORDS

COVID-19, stochastic epidemic modeling, Gillespie algorithm, complex network, contact matri-ces, population density, epidemic simulation

INTRODUCTION

There are numerous epidemic modeling and simulation toolkits available as of 2020, in the rangefrom educational toy models to global-scale comprehensive frameworks (e.g. GLEAMviz [1],which accounts for global air transport networks, commuting patterns, temporal changes ofepidemic parameters, etc). While working with rough-scale global epidemic dynamics tends tobe somewhat less diﬃcult than detailed local simulations, as numerous diﬀerences in epidemicspreading patterns and response options tend to be averaged away, the global models are lessuseful for policymakers as there is no global pandemic response organization that can work ontruly international scale.As epidemic response organizations do not work with unlimited resources (and in fact, availableresources can be quite limited in e.g. developing countries), optimal resource allocation is crucialfor developing the most eﬃcient counter-epidemic strategy. It includes determining the most a r X i v : . [ q - b i o . P E ] J a n ulnerable population segments, the most probable epidemic hotspots, and the most impactfuldates and locations for introducing non-pharmaceutical interventions. Our framework employsage-based and context-based contact matrices to account for diﬀerences in social networks andcontact patterns of people within diﬀerent age groups; it uses detailed population density mapsto predict (and visualize) geographical locations of epidemic hotspot clusters, and it employssurrogate modeling optimization system to adapt itself for real observations and medical statisticsavailable for the chosen simulation area, enabling operators to choose most impactful responseoptions. EPIDEMIC PROCESS DESCRIPTION

General assumptions

We use a standard representation the population as a network consists of N nodes (individuals) { i } and a set of links { e ij } , representing a contact between each nodes i and j . We assume thatthe number of individuals in the network remains stable during the simulation [2].In our case the network is weighted – all links e ij have the weights depending of the spatialdistance between each pair of nodes i and j ) and complete i.e. we assume that all nodes areconnected each others.For the description of the infectious process we construct a stochastic SEIR model on the network,so all nodes are in one of the four states: S (susceptible), E (exposed in the latent period), I (infected) or R (recovered/removed).Stochastic infection process S → E occurs as a result of contact between a susceptible ( i ∈ S )and infectious ( j ∈ I ) individual. This transition generally depends on the frequency of contactsand proportional to the infection rate β [2]. Simultaneously, two additional random processestake place in the network: the transition of individuals from the exposed to infectious state E → I and the removal process I → R .The model assumes two types of contact events, which we call as “local” and “remote” contacts andare distinguished using the distance matrix M = { m ij } . Thus, we take into account the spreadof infection in some local area/region and the movement of individuals between the regions.In a local area, we assume that each node can meet any other located in this area with the sameprobability. Thus, we introduce a cutoﬀ r on the distance function d ( m ij ) , which is the samefor all nodes in the network. For each node i in the circle of the radius r with the center in i alldistances are assumed to be the same length: d ( m ij ) = r .Remote spreading is modelled as a separate process where each infectious individual can spon-taneously transmit the infection to a random target selected from the entire population (whichautomatically normalizes it for population density, as the target will be located with more prob-ability in densely populated areas), and ﬂips it to the exposed state.Thus, in our model, the following parameters aﬀect the contact probability:ransition Type Mean rate State change S → E l Local contagion β lsi ( s t − , e t + 1 , i t ) S → E d Remote contagion β dsi ( s t − , e t + 1 , i t ) E → I Spontaneous (cid:15) i ( s t , e t − , i t + 1) I → R Spontaneous γ i ( s t , e t , i t − Table 1.

Transition rates between Markov chain states

1. The node assignment into one of the age groups (in our model we use 16 age groups, thefrequency of contacts between all groups forms an asymmetric 16x16 matrix A = { a ij } ).2. The distance between contacting nodes. As only relative distances matter in our model, weuse the Euclidean distance between square locations on the population density map. Closernodes have a higher probability to participate in a contact and infection transmission.3. The population density in a particular area (the contact probability for distant contactsdirectly depends on the population density in the area of contact – this is achieved withrejection sampling of remote contacts over the entire simulated region). Model construction

We represent the stochastic

SEIR model as a continuous time 3-dimensional Markov chain X = { ( S ( t ) , E ( t ) , I ( t )) : t ≥ } that tracks the number of susceptible, exposed and infec-tious individuals at any time point. The number of removed individuals can be calculated as R ( t ) = N − S ( t ) − E ( t ) − I ( t ) .The epidemic starts from one or several infectious individuals n ; the population is assumedto consist of fully susceptible individuals; that is, the initial state of X is ( S (0) , E (0) , I (0)) =( N − n , , n ) . In each moment of time the state space of X is described by changing the stateof individuals according to the rules shown in the table 1. All other types of contacts do notchange the state of the system [3]. SIMULATION ALGORITHM

We employ Poisson processes for link activation. When an exposure event occurs at node i ∈ I ,we ﬁrst randomly select a node j with the Euclidean distance d ( m ij ) less than r , and activate link e ij with the probability depending on the age-contact structure of the population: p ij = a ij /K ,where K is the normalization factor: total contact rate summed for all age groups. If j is in thesusceptible state, the disease is transmitted with the probability p ij and the susceptible nodebecomes infected (exposed).For the remote contact, we select a random node j over the entire simulated geographical re-gion with rejection sampling, so the target node is selected according to the population densitydistribution. This accounts for all non-local contacts (random encounters, travel, etc.), allowingto specify their relative proportion with a single parameter: the frequency of remote contactsrelative to local contacts.n infected node stays in the exposed state for a time interval (exponentially distributed) cal-culated from the mean rate (cid:15) , and after it goes to an infectious state, where it can infect othernodes. Transitions to the recovered state occur according to another Poisson process with theexponentially distributed time intervals calculated from the rate γ . A recovered node neitherinfects nor becomes infected by other nodes.The infection rate of a susceptible node depends on how many infected neighbours it has, but therecovery rate and the time until the transition from the exposed to infectious class is independentof the network conﬁguration and status of neighbours.The mean time to node activation, which enables infection, is given by (cid:104) τ (cid:105) = (cid:82) ∞ τ ψ ( τ ) dτ . Themean time for moving an exposed node to infectious is /(cid:15) and infected node to recover is equalto /γ . A modiﬁed Gillespie algorithm

The continuous time evolution of stochastic processes with known transition rates (including epi-demiological applications) can be numerically simulated with Gillespie algorithms family, whichare statistically accurate. For our needs we used a particular variation: the event-modulatedGillespie algorithm [4].In the direct Gillespie algorithm the instantaneous event rates are updated for all processes,following the occurrence of each event, even if the probability density of the inter-event timesfor the process is not perturbed by an event that has occurred elsewhere. This procedure istime-consuming with large N . In the event-modulated Gillespie algorithm, we use a priorityqueue to keep all future events sorted by their occurrence time; after processing the nearestelement in the queue, the future events that cannot modify the global state (e.g. spreading theinfection to people who are already not susceptible) can be safely discarded. Their occurrencetimes are drawn from the corresponding exponential distributions governing the transition rates.Consequently, the event-modulated Poisson process is a mixture of Poisson processes of diﬀerentrates.Consider N Poisson processes with the rate of λ i , i ∈ [1 , N ] running in parallel. Denote thedensity of the event rate for the i th process by ρ i ( λ i ) . The renewal process is fully characterisedby the probability density of inter-event times, denote this function as ψ i ( τ ) for i th process.For an event-modulated Poisson process with probability density of the event rate ρ i ( λ i ) , wehave ψ i ( τ ) = (cid:90) ∞ ρ i ( λ i ) e − λ i τ dλ. A Poisson process with rate λ , i.e., ψ i ( τ ) = λ e − λ τ is generated by ρ i ( λ i ) = δ ( λ i − λ ) , where δ is the delta function.As there is a global priority queue already, it makes sense to calculate the contact network ofeach individual in-place, during the processing of each transition event.he algorithm itself is outlined in (Algorithm 1), adapted from István Z. Kiss et al. [5]. Itscore capabilities (infection spreading depending on contact matrices and geographical locationof individuals) are abstracted into the ﬁnd_contact function (line 17). input : N: array of nodes with 2D coordinates (all in S state); β i , (cid:15) i , γ i : SEIRparameters; t i : regime change dates; n start : initial infected output: W : array of nodes with 2D coordinates and states at time t . Q ← ∅ /* priority queue */ I ← random_sample( N , n start ) for n ← I do e ← Infect(node = n , time = 0) Q .enqueue( e ) end while Q not empty do U ← random(0, 1) e ← Q .dequeue() β t , (cid:15) t , γ t ← regime_params( e .time) if e is Expose then e (cid:48) I ← Infect(node = e .node, time = e .time + − log U(cid:15) t ) Q .enqueue( e (cid:48) I ) e .node.state ← E end if e is Infect then e (cid:48) E ← Expose(node = ﬁnd_contact( e .node), time = e .time + − log Uβ t ) e (cid:48) R ← Remove(node = e .node, time = e .time + − log Uγ t ) if e (cid:48) E .time < e (cid:48) R .time then Q .enqueue( e (cid:48) E ) end Q .enqueue( e (cid:48) R ) e .node.state ← I end if e is Remove then e .node.state ← R end endAlgorithm 1: Event-modulated Gillespie algorithm for the simulation

SIMULATIONS AND RESULTS

Simulation running

The main simulation code is written in C++ (C++17), using

GDAL library for geospatialtransformations and manipulations, and nanoﬂann k-d tree implementation for radius queries. igure 1.

Simulation results for Canton of Geneva, Switzerland: A) time +39 days; B) time +69 days; C) time+187 days; D) time + 406 days.

The population density map from Global Human Settlement Layer dataset is loaded as a 2Darray, with areas inside and outside border marked separately. The state of the simulation isexported each day of the simulated time as current SEIR counts, and, more importantly, asPNG ﬁle representing the current state of population on the density map (where susceptibleindividuals are represented by grey pixels, exposed — by orange pixels, infected — by red pixels,and removed — by green pixels. With this, localized infection clusters in the simulation areclearly visible, as well as directions of infection spreading.In every moment in time, the simulation software is capable of presenting population densitymaps, showing the exact state of each individual in the simulation. The examples of suchsnapshots are given in the Figure 1. It shows the appearance of new infections in localizedclusters, which are quickly moving to the wide geographical distribution of cases.We run the set of simulations, each simulation starts from the same initial condition, in whicha set of randomly chosen nodes is infected and all the other N − nodes are susceptible. Wemeasure the number of recovered nodes at the end of the simulation normalised by N , called theﬁnal size, averaged over all simulations. For each simulation run, the current counts of nodesin diﬀerent states is dumped in the output log at some periodic intervals of simulation time(e.g. every day in the simulation). These states are aggregated and displayed simultaneously asstochastic epidemic curves.The proximity matrix A is constructed using information about population density for the eachsimulated region, obtained from the ESM2015 dataset [6]. The age-dependent contact matrix M is obtained from the work of Kiesha Prem et al [7], which extended the results of the POLYMODproject [8] to 152 countries. pr2020 May Jun Jul Aug Sep Oct Nov Dec Jan2021 Feb Mar020406080100 Figure 2.

Simulation: daily hospitalizations in Geneva canton, by age groups

Surrogate model optimization

To provide the initial epidemic parameters for the full simulation, we use surrogate model opti-mization, ﬁtting the (surrogate) model (a stochastic SEIR model assuming homogeneous contactnetwork) to match the observed data (a FOPH dataset for COVID-19 cases in Switzerland,provided by ETH Zurich [9]).We wrote an adapter function for running the model and exporting the simulation results, andcomparing it with actual observed number of cases and the number of hospitalizations for theselected geographical region. Then we run a nonlinear least squares optimization routine curve_-fit from scikit-optimize

Python package, which works with arbitrary model functions. Then,we chart the best ﬁtted results together with boundary lines for assumed testing rates.After the ﬁtting, the determined parameter values are used as inputs to the simulation of themain model, leaving only local infection distance and near/far rate of contacts as remaining freeparameters.In the ﬁrst stage of the ﬁtting procedure, we ﬁx the (cid:15) and γ values of the SEIR model (inverseincubation period and inverse infectiousness period, as these are not changing), and assume that β (infectiousness rate) is changing with time due to the diﬀerent regimes. We do not know whenexactly the ﬁrst infections started (as the ﬁrst registered cases appeared while the epidemicswas already in progress for some time), but we have limits on when the ﬁrst lockdown wasintroduced and when it was lifted. We also assume that there was a regime change sometimeafter the lockdown was lifted (people returning to vacations and/or the end of school holidays). pr2020 May Jun Jul Aug Sep Oct Nov Dec Jan2021 Feb Mar050100150200 h o s p i t a li z a t i o n s / d a y Simulated0-55-1010-1515-2020-2525-3030-3535-4040-4545-5050-5555-6060-6565-7070-7575+

Figure 3.

Daily hospitalizations in Geneva canton, by age groups, compared to the simulation results

This leaves the following parameters to be ﬁtted:1. t k — the regime change dates (actual start of the epidemics, lockdown started, lockdownended, new wave started), with bounds. Some of the regime change points (the date ofintroduction of the new pandemic response policies) are known, and can be therefore ﬁxed,but it still occasionally makes sense to consider them as free parameters: if the optimizerﬁnds a regime change date completely on its own, from data only, it is a part of veriﬁcationthat the model and the optimizer work correctly.2. β k — infection spreading rates for these time intervals (for β it is bounded from initialestimations of R ).3. q k — the assumed percentage(s) of people tested, deﬁned for the each regime interval. It isquite diﬃcult both to estimate how much it has changed through time (e.g. in Switzerland,testing criteria changed nearly every week during the peak of the epidemic), and its presentvalue (there is much uncertainty in e.g. estimating the rate of asymptomatic cases in generalpopulation, the probability of somebody having the symptoms going to do the testing, etc.— all these estimates aﬀect the value of q ). In a sense, q acts as a hyperparameter here —some ranges of its values allows for the more eﬃcient ﬁtting than the others.The ﬁtting routine consists of solving the optimisation problem of ﬁnding the minimum of thefunction F ( θ ) = N (cid:88) i =1 ρ ( f i ( θ ) ) , where θ = ( θ , ..., θ r ) is a set of parameters which we want to estimate, N is the number ofavailable data points, ρ is a loss function to reduce the inﬂuence of outliers, f i ( θ ) is the i -thcomponent of the vector of residuals.ame Description Typical values β k Infection rates (for diﬀerent regimes) 0.1–2.5 (cid:15)

Incubation rate 0.15–0.3 γ Removal rate 0.1–0.3 t k Epidemic regimes change times, days from the oﬃ-cial start of the epidemic location dependent

Table 2.

Free parameters to be ﬁtted in the surrogate model

Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan2021 Feb Mar0%5%10%15%20%25% % r e c o v e r e d Figure 4.

Simulation: percentage of recovered population in the canton of Geneva

Given a model function m ( t ; θ ) and some data points D = { ( t I , D I | i = 1 , .., N } one normallydeﬁnes the vector of residuals as the diﬀerence between the model prediction and the data, thatis F i ( θ ) = m ( t i ; θ ) − d i .To compare the simulation outputs against real world observations, we are using the COVID-19 clinical dataset, provided by the Swiss Federal Oﬃce of Public Health [9]. The epidemicparameters obtained in the ﬁtting step were submitted to the main simulation code, which ranfor 100 independent iterations. We used the number of hospitalizations as a reference output forcomparison purposes (we consider hospitalizations to be a subclass of R (removed) compartment,as hospitalized patients normally do not infect anybody else). Additionally, we compare thecumulative percentage of recovered patients in the simulation with the numbers obtained fromseroprevalence studies performed in the canton of Geneva [10]. The simulation output as thenumber of the daily hospitalized cases is plotted in the Figure 2. The real hospitalizations datain the canton of Geneva for the same time period is given in the Figure 3, compared to thesimulation values. The simulated percentage of recovered population (for which seroprevalenceis a proxy) is plotted in Figure 4. ESULTS AND DISCUSSION

In this paper we presented a developed framework for stochastic simulation of epidemic processesin populations of any size taking into account population density and age structure in the regionunder consideration.In the most models currently used, the entire population is considered to be homogeneous interms of contact rates (either constant or independent and identically distributed). With thissimpliﬁcation, everyone has an equal chance of contacting anyone else. The number of infectionsis proportional to the product of the number of infected and the number of susceptible individuals,and depends on the initial rate of transmission of infection (that is, on the properties of the virusitself) and the density of contacts between diﬀerent groups.In our model, we take into account the contact heterogeneity of the population and diﬀerencesin population density in diﬀerent regions, which signiﬁcantly changes the prognosis of the devel-opment of the epidemic. We also take into account that at the diﬀerent stages of the epidemic,the infectiousness rate is changing, due to diﬀerent quarantine regimes and response policies.To simulate the infection process, we have developed a new variant of the event modulated Gille-spie algorithm. Our implementation supports multiple regime intervals and arbitrary functionsand distributions of epidemic parameters. It can be used to validate the theoretical models andit has the practical application for simulation the epidemic dynamics in larger regions and theentire countries, allowing for diﬀerent levels of detail.The simulation reproduces faithfully the initial wave of COVID-19 infections which has happenedin March-April 2020. However, even when normalized by the incidence of hospitalizations, theage structure of simulation output does not seem to match the real data observed in the cantonof Geneva; further research is necessary to discriminate between competing hypotheses (e.g.misdetection of cases in the younger population, age-relative diﬀerences in travel patterns, caseclusters in retirement homes, etc.)The simulated number of recovered cases can be compared with the result of seroprevalencestudies at a particular point in time. E.g. at May 1st, 2020, the seroprevalence in the cantonwas estimated to be 6.6% (95% conﬁdence interval: 4.3%–9.4%; the simulation gives 2.9% ofrecovered cases at this date. It is lower than the conﬁdence interval estimate, which can beexplained by a number of factors outside the scope of the model (most importantly with thenumber of antibodies diminishing with time, but also geographical clustering, age structureinﬂuence, etc). However, these results can serve as a limiting point for seroprevalence estimates.Our system allows us not only to track and predict the epidemic progression, but also to simulateand evaluate various measures introduced to contain the spread of infection, such as limitingmobility, partial or complete quarantine, etc. The simulation software also makes it possible tosubsequently model the eﬀectiveness of vaccination and determine the sequence and number ofvaccinated persons required to develop population immunity or to keep the number of infectedat a certain level.

EFERENCES [1] Wouter Van den Broeck et al. “The GLEaMviz computational tool, a publicly availablesoftware to explore realistic epidemic spreading scenarios at the global scale”. In:

BMCinfectious diseases

Reviews ofmodern physics

Applied Mathematics andComputation

265 (2015), pp. 1026–1043.[4] Naoki Masuda and Luis EC Rocha. “A Gillespie algorithm for non-Markovian stochasticprocesses”. In:

SIAM Review

Cham: Springer

598 (2017).[6] Christina Corbane and Filip Sabo.

European Settlement Map from Copernicus Very HighResolution data for reference year 2015 . European Commission, Joint Research Centre(JRC), 2019. doi : .[7] Kiesha Prem, Alex R Cook, and Mark Jit. “Projecting social contact matrices in 152countries using contact surveys and demographic data”. In: PLoS computational biology

PLoS Med

Epidemiological situation in Switzerland and Liechtenstein . . Accessed: 2020-01-18.[10] Silvia Stringhini et al. “Seroprevalence of anti-SARS-CoV-2 IgG antibodies in Geneva,Switzerland (SEROCoV-POP): a population-based study”. In: The Lancet (2020).

PPENDIX A

Age 0-5 5-10 10-15 15-20 20-25 25-30 30-35 35-40 40-45 45-50 50-55 55-60 60-65 65-70 70-75 75+0-5 1.35 0.56 0.25 0.14 0.21 0.39 0.69 0.73 0.49 0.22 0.22 0.18 0.14 0.13 0.07 0.045-10 0.48 5.67 0.89 0.22 0.15 0.33 0.62 0.80 0.88 0.37 0.21 0.16 0.14 0.12 0.05 0.0510-15 0.18 1.74 9.11 0.81 0.29 0.25 0.43 0.68 1.11 0.64 0.34 0.15 0.09 0.10 0.07 0.0715-20 0.09 0.29 3.11 11.68 1.50 0.77 0.64 0.79 1.09 1.20 0.67 0.24 0.08 0.06 0.03 0.0320-25 0.13 0.17 0.31 2.35 3.95 1.77 1.23 1.15 1.01 1.32 0.98 0.49 0.12 0.06 0.06 0.0625-30 0.35 0.24 0.24 0.89 2.07 3.66 1.91 1.52 1.36 1.17 1.21 0.68 0.21 0.08 0.04 0.0430-35 0.59 0.80 0.64 0.52 1.06 1.82 3.13 1.98 1.58 1.26 0.96 0.67 0.29 0.14 0.06 0.0635-40 0.61 0.95 0.77 0.74 0.77 1.45 1.81 3.14 2.22 1.46 1.06 0.53 0.29 0.21 0.11 0.0540-45 0.35 0.80 1.03 1.20 0.97 1.34 1.73 1.99 3.19 1.89 1.34 0.47 0.22 0.16 0.11 0.0645-50 0.19 0.58 0.79 1.86 1.00 1.12 1.39 1.59 1.79 2.57 1.37 0.60 0.20 0.13 0.11 0.1250-55 0.17 0.64 1.11 1.53 1.15 1.48 1.38 1.38 1.91 2.15 2.37 1.00 0.30 0.16 0.10 0.1255-60 0.30 0.66 0.79 0.94 0.75 1.27 1.35 1.07 1.29 1.13 1.33 1.68 0.52 0.27 0.12 0.1160-65 0.31 0.33 0.26 0.44 0.38 0.61 0.82 0.79 0.63 0.56 0.51 0.71 1.23 0.49 0.26 0.1265-70 0.23 0.35 0.28 0.17 0.27 0.42 0.73 0.68 0.65 0.43 0.44 0.58 0.63 1.29 0.33 0.1770-75 0.10 0.27 0.31 0.31 0.17 0.29 0.33 0.53 0.66 0.52 0.41 0.35 0.67 0.72 1.00 0.3275+ 0.20 0.27 0.40 0.33 0.15 0.18 0.33 0.38 0.48 0.57 0.56 0.35 0.27 0.40 0.33 0.56