[PDF] A model-based approach to assess epidemic risk

Abstract

We study how international flights can facilitate the spread of an epidemic to a worldwide scale. We combine an infrastructure network of flight connections with a population density dataset to derive the mobility network, and then define an epidemic framework to model the spread of disease. Our approach combines a compartmental SEIRS model with a graph diffusion model to capture the clusteredness of the distribution of the population. The resulting model is characterised by the dynamics of a metapopulation SEIRS, with amplification or reduction of the infection rate which is determined by the mobility of individuals. Then, we use simulations to characterise and study a variety of realistic scenarios that resemble the recent spread of COVID-19. Crucially, we define a formal framework that can be used to design epidemic mitigation strategies: we propose an optimisation approach based on genetic algorithms that can be used to identify an optimal airport closure strategy, and that can be employed to aid decision making for the mitigation of the epidemic, in a timely manner.

Full PDF

NNoname manuscript No. (will be inserted by the editor)

A Model-Based Approach To Assess Epidemic Risk

Hugo Dolan · Riccardo Rastelli

Received: date / Accepted: date

Abstract

We study how international ﬂights can facilitate the spread of anepidemic to a worldwide scale. We combine an infrastructure network of ﬂightconnections with a population density dataset to derive the mobility network,and then deﬁne an epidemic framework to model the spread of disease. Our ap-proach combines a compartmental SEIRS model with a graph diﬀusion modelto capture the clusteredness of the distribution of the population. The result-ing model is characterised by the dynamics of a metapopulation SEIRS, withampliﬁcation or reduction of the infection rate which is determined by themobility of individuals. Then, we use simulations to characterise and studya variety of realistic scenarios that resemble the recent spread of COVID-19.Crucially, we deﬁne a formal framework that can be used to design epidemicmitigation strategies: we propose an optimisation approach based on geneticalgorithms that can be used to identify an optimal airport closure strategy,and that can be employed to aid decision making for the mitigation of theepidemic, in a timely manner.

Keywords

COVID-19 · SEIRS compartmental model · Genetic Algorithm · Network Analysis · Human Mobility

In recent years, the extensive development of the transportation infrastructurehas radically changed how connected our world is. International ﬂights allowindividuals to travel around the globe in just a few hours or days. This has

Hugo DolanUniversity College DublinE-mail: [email protected] RastelliUniversity College DublinE-mail: [email protected] a r X i v : . [ phy s i c s . s o c - ph ] S e p Hugo Dolan, Riccardo Rastelli important negative implications on the potential spread of pandemic diseases,whereby epidemics can reach a worldwide scale before eﬀective responses areset in place. The recent COVID-19 outbreak [17] has clearly raised and em-phasised this problem. As a response to the emergency, many countries havetaken drastic measures to contain and slow down the spread of the virus byimposing lockdowns and airport closures [5]. While these measures have beensuccessful in conﬁning the epidemic, the immediate and chaotic response hasblurred the actual role played by the topology of the infrastructure networkon the spread of the virus.The main goal of this paper is to create a model-based framework thatcan inform decision making regarding airport closures as a means to slowingdown a potential pandemic without causing excessive economic damage. Inparticular, we introduce a new framework to study networks of internationalﬂights as potential vehicles for the spread of pandemic diseases.In this paper, we ﬁrst propose an in-depth analysis of the Open Flightsnetwork dataset [13], which describes a large number of ﬂight connections be-tween more than 3 ,

000 airports. We calculate a number of descriptive statisticsfrom the data, in order to study the underlying topology of this infrastructurenetwork, and essentially to understand how individuals can move between dis-tant locations. We use various centrality measures to identify key airports, andwe test the resilience of the network when these key airports are removed.Then, we ﬁt a Stochastic BlockModel (SBM) to partition the airports intohomogeneous groups. The SBM, originally introduced and studied by [20], isa fundamental model and tool for statistical network analysis, since it canhighlight groups of nodes that exhibit similar connectivity patterns. Inferencefor the SBM can be performed using both classical and Bayesian approaches[12]. One fundamental aspect of this model is that it can be interpreted asa ﬁnite mixture model for networks [6], and thus it borrows many conceptsand tools from this related research area. In this context, a useful by-productof the SBM framework is that it allows us to compare the partition on theairports with their actual geographical location.After the exploratory data analysis, we use the infrastructure network tocreate a model for the simulation of epidemics. An essential aspect of this taskis the development of a statistical network model that can combine these ﬂightroutes data with the geographical distribution of the population. Our aim isto give a model-based quantiﬁcation of the pandemic risk which is ampliﬁedby the travelling of individuals, and to possibly identify eﬀective interventionsthat can mitigate this risk. [5] propose an approach similar to ours, in thatthey combine the infrastructure network with the gridded population datato study the eﬀects of the airport closure interventions that were actionedat the beginning of this year. [5] use a tool called GLEAM [4, 3] which cancombine data from diﬀerent sources to predict the behaviour of the epidemicusing an individual-based compartmental SEIR model. While they focus onthe eﬀects following the actual airport closures, in this paper we aim at deﬁninga framework to take new decisions that can lead to optimal airport closures,or potential future airport reopenings.

Model-Based Approach To Assess Epidemic Risk 3

Our approach relies on epidemic compartmental models [8], and in par-ticular on a SEIRS model. This framework postulates that the populationis divided in 4 ordered compartments (susceptible, exposed, infectious, re-covered) and that diﬀerent rates determine the ﬂow of individuals from onecompartment into the next, and eventually back into the ﬁrst compartment.This family of models has been largely employed in various research ﬁelds tomodel the evolution of epidemics, and it has been also successfully used withinthe context of COVID-19 [9, 5, 15, 16, 10].One fundamental aspect of our epidemic model is that, similarly to [5, 19,2], we consider a metapopulation where each subpopulation is centred at anairport location, and whereby the local epidemic is determined by a SEIRSmodel. We use a graph diﬀusion process to describe the ﬂows between thevarious subpopulations, which in turn aﬀects the local dynamics. Not only thisallows us to observe the epidemic locally for each subpopulation, and globally,but it also allows us to appreciate the temporal-geographical progression ofthe virus.We calibrate the parameters of our model using the recent COVID-19 pan-demic as an example and as a meter of comparison. We do not claim thatour results are speciﬁc to the COVID-19 epidemic nor that they should beused within this context; rather, we provide a general methodology that couldbe employed in any epidemic setting, and, for example purposes, we recre-ate realistic situations that resemble the recent COVID-19 epidemic. In fact,we test the sensitivity of our model by running a number of simulations thatencompass a variety of possible real scenarios.The fundamental contribution of this work regards the study of optimalepidemic mitigation strategies. Once we possess a model which is calibrated toa realistic setting, we explore an optimisation approach to identify what couldbe the optimal airport closure strategies that should be implemented. We usethe predictions from our epidemic model to construct an objective functionthat takes into account measures for the spread of the disease as well as eco-nomic losses. We perform the optimisation using Genetic Algorithms (GAs)[18]. GAs are heuristic stochastic optimisation algorithms that explore newcandidate solutions by selecting and transforming a set of current solutionsusing some basic principles of evolution and natural selection. In our context,GAs are especially convenient since the problem that we address is a combi-natorial one, where we want to ﬁnd the optimal subset of airports that shouldbe closed to minimise our objective function.The software that we have used in this paper is maintained by one of theauthors and is available from the GitHub repository: [1].

In this section we propose an exploratory data analysis and basic statisticalmodelling of the Open Flights dataset [13]. The Open Flights dataset containsinformation on 3 ,

425 airports globally, including a database of 37 ,

594 commer-

Hugo Dolan, Riccardo Rastelli cial routes between these airports collected in 2014. The dataset is transformedinto an adjacency matrix with nodes representing airports and directed edgesrepresenting whether there exists a direct route between any two airports. InFigure 1 we present the network visually, and on initial inspection it is clearthat the network exhibits extremely high degree of connectivity, with the plotof degree distribution indicating that over 20% exhibit a degree greater than10.Fig. 1: Open Flights Network Visualisation and its in-degree distribution.Identifying airports which are important to the overall connectivity of thenetwork is crucial in gaining a better understanding of the network’s topol-ogy. We consider several metrics to measure the importance of nodes. Theseinclude the Page-Rank centrality, betweenness centrality, coreness ranks, aswell as the in-degrees and out-degrees of nodes [11]. We present a table of the20 most important airports according to Page-Rank in Table 1. Airports withhigh Page-Rank are clearly major international destinations, and they forman extremely well connected subnetwork with a coreness of over 60, meaningevery airport in this subnetwork has in-degree + out-degree higher than 60,or, equivalently, since the graph is approximately symmetric (the majority ofair routes run return ﬂights), we can say that every airport in this subnet-work has an out-degree close to or exceeding 30. It is interesting to note thatairports with high betweenness centrality (Charles De Gaulle, Dubai, Beijing,Amsterdam, Los Angeles, Toronto, Frankfurt) are also major connecting ﬂighthubs, exhibiting out-degrees of over 200.We identify homogeneous subgroups of airports within the network by em-ploying a SBM framework: we utilise a Python implementation of an eﬃcientMarkov chain Monte Carlo method, which is suitable for inferring SBMs inlarge networks, as described by [14]. The optimal SBM partition and the corre-sponding block matrix are shown in Figures 2a and 2b, respectively. It is clearthat the communities found are very strongly associated with the geograph-

Model-Based Approach To Assess Epidemic Risk 5

Table 1: Summary Statistics for the top 20 airports, sorted according to Page-Rank centrality values.ical location of the airports, and with their region or province. This is quitesurprising as this information is not encoded explicitly in the data providedto the algorithm. This would strongly suggest a high degree of connectivity ofairports not only globally but also within regions or geographical areas. Wealso note from the block matrix in Figure 2b that the majority of connectionsare within relatively large communities representing the geographic clusteringobserved in Figure 2a, but also towards the lower right corner there is signif-icant disassortative behaviour, likely these nodes are large international hubssuch as the small community of London, Frankfurt, Amsterdam, Charles DeGaulle which share connections to many cities across the world.We continue our exploratory analysis by studying the percolation proper-ties of the network [11]. We percolate the network by sequentially removing thenodes and their connections, and observing how the connectivity of the graphchanges. We remove the nodes following both a random order, and following adecreasing order of the out-degree and other centrality measures. The resultsare shown in Figure 3. The network is highly resilient to random attacks, sincethe removal of almost all nodes is required in order to disrupt network connec-tivity. However, we note that the network is moderately more vulnerable totargeted (degree based) attacks, yet this would still require more than half ofall airports to be removed for the single giant component to disappear. Sim-

Hugo Dolan, Riccardo Rastelli(a) Map of the SBM results, with colours indicating the cluster memberships.(b) Block matrix visualisation of the adjacency matrix, where the black dots indi-cate routes between airports and the blue boxes indicate the clusters.

Fig. 2: Fitted SBM results.ilarly, the network is moderately vulnerable to targeted attacks according toother ranking factors (Page-Rank, betweenness, coreness). We note that theseprocedures are also averaged over many trials to account for the removal ofvertices of equal rankings in diﬀerent orders, however, we ﬁnd these resultsvery quite similar to percolation by degree.In conclusion, the Open Flights network summary statistics show that air-ports which are large regional destinations or hubs for connecting ﬂights tend

Model-Based Approach To Assess Epidemic Risk 7

Fig. 3: Percolation on the Open Flights dataset via a variety of ranking criteria(results are averaged over 100 independent simulations).to have high importance to network connectivity. Furthermore it is observedthat some nodes in the network are extremely well connected, both at regionaland global level, with signiﬁcant geographical community structure. The net-work is also highly resilient to random or deterministic damage.In the context of epidemics, these initial ﬁndings provide a very solid evi-dence that the ﬂight connections can sadly be a very eﬃcient vehicle to facil-itate the spread of diseases, and, more importantly, that substantial network“damage” (e.g. airport closures) is required to ensure that the disease doesnot spread to a worldwide scale. This evidence motivates our work, in thatwe aim at ﬁnding optimal mitigation strategies that can reduce the pace ofthe epidemic to a much smaller scale, without causing excessive disruption toeconomies and to this particular infrastructure network. dψdt = c ( A − D ) ψ Hugo Dolan, Riccardo Rastelli

We use the notation ψ to represent the vector of ﬂuid volumes at every node, A to denote the adjacency matrix of the network and D to denote a diagonalmatrix of containing the degrees of every node. A full derivation of this canbe found in [11].Additionally, we introduce the SEIRS compartmental epidemiology model.Each letter of the model name denotes a compartment of the system (Suscepti-ble ( S ), Exposed ( E ), Infectious ( I ) and Recovered ( R )), in which some num-ber of individuals from the total population ( M ) reside. Figure 4 illustratesthe direction of progression from state to state, whilst Equation 2 indicatesthe exact rates at which the population in each compartment changes. TheFig. 4: SEIRS compartmental ﬂow diagram for the equation set: dSdt = δR − SβIM , dEdt = SβIM − (cid:15)E , dIdt = (cid:15)E − γI , dRdt = γI − δR greek letters are parameters of the system which can be ﬁtted to match thecharacteristics of some observed epidemic. Whilst the system cannot be solvedanalytically, we can ﬁnd a numerical approximation to the solution, which issuﬃcient for our simulation purposes.3.2 Model DeﬁnitionIn order to model the transmission of disease through the international ﬂightsnetwork, we opt to use the SEIRS model combined with a graph diﬀusionmodel as described in the previous section. We will refer to the airports’adjacency matrix as A and denote airport nodes as v j ; j = 1 , . . . , N. The total number of nodes in the network is N and the associated popu-lation at each node is M j . Let us deﬁne the local epidemic state vector as θ j ( t ) = ( S j E j I j R j ) T , which represent the compartments of the SEIRSmodel for airport population at any given time. A condition of the SEIRSmodel constrains the total population of all compartments to equal the totalpopulation: S j ( t ) + E j ( t ) + I j ( t ) + R j ( t ) = M j We assume that the local population is fully mixed (i.e. everyone has equalchance of being infected), as this is a standard assumption of compartmentalepidemic models, however we assume this only to be true at the individual air-port level, and not for the entire global system. Additionally we introduce α j as the proportion of the population which can travel, and c as the probabilitythat an individual departs from an airport on any given day. The proportions Model-Based Approach To Assess Epidemic Risk 9 allow us to deﬁne an additional variable ψ j ( t ) = α j θ j ( t ) which correspondsto the mobile epidemic state.We can deﬁne at high level our simulation procedure:1. At each time-step we must ﬁrst initiate community spread of the diseasethrough each airport’s local population. This is represented by the com-partments of its SEIRS model denoted by the local epidemic state vector θ j . Using a fourth-order Runge-Kutta approximation technique we updateeach compartments value in accordance with the dynamics of the SEIRSmodel. The new state of the compartments after community spread is de-noted θ ∗ as opposed to the previous state θ , notice that we drop theairport subscript j as this process generalises to a matrix θ whose columnsare vectors θ j .2. Now take this new local epidemic state θ ∗ and split it into two types ofpopulation members: a base population denoted θ B , who are permanentresidents to the local area; and a second group θ T , i.e. the transient pop-ulation who are temporary visitors, who have arrived at the airport onbusiness or holidays and will return home after a short period. This dif-ferentiation is necessary to ensure that the system does not arrive at anunrealistic equilibrium, where airports have drastically diﬀerent popula-tions relative to their initial values (we assume no permanent migration inour model).3. Next, we compute the proportions of θ B and θ T who can ﬂy (in eachcompartment), this is simply achieved by multiplying the vectors by α j which was deﬁned at population level θ j , such that ψ + = αθ B describesthe number of outbound passengers and ψ − = αθ T describes the numberof returning travellers.4. Using our diﬀusion model compute the changes to θ B and θ T at each air-port. The exact value of these changes is weighted based on several factorsincluding: the diﬀerences between outbound and returning passengers atevery connected airport (corresponding to the “ﬂuid”, in the interpreta-tion of Section 3.1), the relative importance of the airports in the networkand the node’s degree (controlling how the travellers get distributed acrossroutes).5. Recombine updated values of θ B and θ T into the aggregate populations θ and loop for as many iterations of simulation as required.For completeness we include both a diagrammatic form of the algorithm(Figure 5) as well as a the full algorithm (Table 2) which reﬂects the high leveloverview above. A full derivation is provided in Appendix 7.1. The algorithmdescribed above has been vectorised so that θ is now a × N matrix contain-ing the states for all airports θ ( t ) = ( θ ( t ) ...θ N ( t )) , and similarly for ψ + , ψ − which represent the states of outbound and returning travellers respectively.The matrix /D is a diagonal matrix with entries that are /deg ( v j ) , thisnormalises outﬂows from airports preventing more passengers from leaving anode when there are not individuals in that location. The matrix B is an op-erator encoding the diﬀerential equations of the SEIRS model for vectorised Fig. 5: Diagram of the epidemic diﬀusion model steps.Table 2: Simulation pseudocode.application to many airports simultaneously, whilst ˜ θ is a modiﬁed version of θ to enable B be applied as a linear operator. Finally C is a weighted versionof adjacency matrix A , so that the outﬂows from vertices reﬂect the rela-tive importance of adjacent airports. The above should provide an intuition,however we reiterate that there is a far more comprehensive derivation of themathematics required in Appendix 7.1.3.3 Data Sources, Parameters, EstimationIn this section we provide further information on the parameters of our model,on their interpretation, and on how we have used diﬀerent data sources to inferor calibrate these parameters. Our framework requires information on airports, Model-Based Approach To Assess Epidemic Risk 11 routes, demographics, wealth, which is diﬃcult to obtain and use. A summaryof the model’s parameters is provided in Table 3, for convenience.Table 3: Model parameters [*] necessary conditions for the escalation of theepidemic.One fundamental transformation that we apply is the following: let P bethe vector of Page-Rank values obtained from earlier analysis and A be theadjacency matrix of ﬂight connections. Then, we deﬁne and work with thematrix C , which is the relative centrality matrix with elements deﬁned by C ij = A ij P j (cid:80) k A ik P k . The interpretation of this new matrix is that its values areidentical to those of A , except now the edges have been assigned weights basedon the relative importance of adjacent nodes. This transformation ensures thatthe ﬂows of individuals between airports are scaled so that they reﬂect highertraﬃc to major airports and less to airports of little signiﬁcance in the networkthus preventing the system from obtaining unrealistic conﬁgurations as thesimulation progresses.With regard to the metapopulation, we must deﬁne the number of in-dividuals that have access to, and are served by, each of the airports. Thiscorresponds to estimating the initial population parameters θ . For this task,we utilise the gridded population of the world dataset [7], and we assign avalue for total population at each airport location. In order to perform thisassignment there is an immediate problem as many airports are often in closeproximity to each other (for example some cities are served by multiple air-ports). We assume that the maximal distance anyone will travel to reach an airport is 240km (60 kph * 4 hours = 240km). Using this assumption, for eachcell of the grid of the population dataset, we search all the airports that arewithin this radius. Then, we assign a population contribution to each, pro-portionally to their Page-Rank values. This metric is chosen as it provides aproxy for the likelihood that the airport will be preferred by travellers in thelocal region. Note that this approach will automatically exclude populationgrid cells which are not within 240km of any airport. These populations areexcluded from the simulation as they are unlikely to be ﬂying regularly, thusthey will not play a relevant role in the spread of the disease.Finally, we estimate the percentage of each population who can ﬂy α + .It is obvious that this percentage will vary across countries, depending on anumber of factors, primarily wealth. To ﬁnd a reasonable value for this modelparameter, we acquire passenger estimates by country as supplied by the WorldBank and divide these through by the total airport populations for the givencountry. We then assume the proportion by airport is the same as at countrylevel .The SEIRS model, as well as the graph diﬀusion model, rely on severalstrong assumptions that inevitably impact the results. For clarity, we providea list of our modelling assumptions here below, to highlight the speciﬁc featuresthat our model will exhibit.1. Fully Mixed Local Populations:

Within any given node every memberof the population has equal chance of contact and thus passing on disease.2.

Fully Mixed Wealth:

The proportion of population which may ﬂy isdistributed the same at node / airport level as at country level.3.

Maximal Travel Distance:

We assume the maximum distance someonewill travel is 240km to get to an airport, and thus anyone who exists outsideof all airport radius is assumed to be isolated and may be excluded fromthe model .4. Air Transit Only:

We assume the only way for the disease to spreadbetween nodes is via air routes and that spread via other means eg. boat& road links are negligible.5.

No Permanent Immigration:

Assume all individuals will return to theirhome country by the end of the business week.6.

No Seasonality:

We assume that all parameters of the model are constantthroughout the duration of simulation.7.

Universal Rates:

We assume that the parameters of the SEIRS modelare universal and do not vary signiﬁcantly between countries.While it is obvious that each of these assumptions can have a non-negligibleeﬀect, we do argue that our simulations can show a remarkable resemblancewith a realistic epidemic, such as the recent COVID-19 case. Sometimes this proportion is greater than 1 particularly for major hubs, where annualtraﬃc through the airport may exceed the size of the local population assigned. In thesecases we replace our estimate by the average global proportion, clearly this is imperfectbut is likely as accurate as we will be able to obtain given this information is not readilyavailable. The percentage of total population excluded by the model is less than 3% Model-Based Approach To Assess Epidemic Risk 13 (cid:15) = 0 . , δ = , γ = 0 . and β = 0 . . These values where obtained from astudy by [9], with the additional parameter δ which is set conservatively as wedo not know the duration of which people will remain immune to COVID-19.We also setup the model such that the ﬁrst cases occur in Wuhan for resultswith similarity to the COVID-19 outbreak [17].We proceed to aggregate the time series data for the S , E , I , R compart-ments to community levels (here, a community is a geographic region whichwe have labelled according to SBM resulting partition), and we consider theprogression of the epidemic with regards to the most relevant compartments:Exposed (E) and Infected (I) . The community labelling choices are somewhatarbitrary but the main purpose of these plots in Figure 6 is to illustrate thenetwork eﬀect results in a somewhat staggered epidemic across diﬀerent ge-ographic communities. There is evidence of a gradual dispersion of the virusFig. 6: Exposed and Infectious local epidemic states for key communities de-rived from SBM. (Y-axis scales are not comparable) across the world starting in China, moving onward into other areas includingSouth East Asia, Japan, Russia, India, South Africa, Middle East. Some ofthe last places to be infected are the Americas, Nordic Countries, Alaska andTurkey. Figure 7 is an illustrative sketch of one possible spread through thenetwork which could be derived from the spread sequence in Figure 6.Fig. 7: Illustrative sketch of the epidemic based on the sequencing of the epi-demic spread provided by Figure 6.4.2 Sensitivity Analysis of Key ParametersWhere possible, we have utilised data to calibrate the model parameters, andwhere not available, we have elected to make some simplifying assumptionsregarding travel behaviour. However, we have not yet addressed the estimationor calibration of the SEIRS parameters in a more formal way. In the previoussection, we have obtained the results for the unmitigated scenario using thevalues estimated by [9]. In this section. in order to obtain a better under-standing of the impact of these SEIRS parameters on our results, we proceedto conduct a sensitivity analysis.Whilst we can be reasonably conﬁdent that reported rates of recoveryare reliable, this is perhaps less true for infection and exposure rates. Thuswe let β = kγ (infection rate) and (cid:15) = sγ (exposed to infectious rate)where we assume γ (recovery rate) is known. It is clear that k > other-wise β < γ and the epidemic quickly vanishes, similarly let k > s > to ensure valid parametrisation of the model. We will not consider varying δ (loss of immunity rate) as we will primarily focus our analysis exclusivelyto the ﬁrst wave of the epidemic. Let k ∈ { . , . , , . , , , } , and s ∈ { . , . , . , , . , } with γ = . We choose a smaller rangefor s as medical research suggests that people remain in the exposed state forup to days for COVID-19. We see from Figure 8 that the model behaves Model-Based Approach To Assess Epidemic Risk 15

Fig. 8: Sensitivity analysis conducted on major airports: ( − ) indicates β <γ (+) indicates time exceeded days.as expected. As s increases, the maximum number of infections decreases andthe time until peak infections increases. This makes sense in the context ofFigure 4, because, when s approaches k , the rate of change in the exposurecompartment approaches zero. Similarly, as k increases, the maximum numberof infections increases, and the time to reach peak infections decreases, whichmakes sense in the context of the converse of the previous argument. It is alsointeresting to note that the time for the epidemic to return below infec-tions is often greater than days, except when k is very large. Similarly,a larger value of s will also prolong the epidemic due to the slower rate ofchange in the exposed compartment. It is clear from Figure 8. that the valuesof s and k can vary the peak number of infections quite drastically, often onthe order of millions of cases, even with only a . change in parameter val-ues. Thus, we suggest that in the absence of reliable estimates of β, γ, (cid:15) the reader should treat any numerical conclusions presented as stylised versions ofreality, which convey general trends but not precise predictions.Finally we have included a benchmark case in which the world average citypopulation is located in a single airport - essentially a global SEIRS model. Thebenchmark is provided for comparison but also to highlight how this globalapproximation is grossly inadequate for modelling a metapopulation whichexhibits strong geographical clusteredness. The benchmark clearly underesti-mates the time until peak infections and the peak number of infections. This section examines potential mitigation strategies in the context of epi-demics on international ﬂight networks. In order to evaluate these strategieswe must ﬁrst select some performance metrics. We decide to select metricswhich are easily interpretable by policy makers and the general public, whilstalso being useful in the context of managing hospital intensive care unit capac-ity and overall impact of the epidemic. Speciﬁcally we will measure the peaknumber of infections and the total number of cases of the disease, as these aretransparent and can easily be measured from our simulations.Fig. 9: Impact of worldwide permanent airport closures from nth day sinceﬁrst infection.

Model-Based Approach To Assess Epidemic Risk 17 s = 2 and k = 5 ). Our simulations in Figure 9 demonstrate that closing routesearlier reduces the peak number of infections, but does not signiﬁcantly reducethe total number of cases, unless all airports are closed by the end of day . This demonstrates the high level of connectivity within the network withcases of infections proliferated across every continent within the ﬁrst daysof the outbreak. It is quite remarkable that the impact of these infectionsonly becomes noticeable after around days, which would explain how therecent COVID-19 could spread across the world unnoticed for months before apandemic was declared. We note that closing airports immediately after day 1would have a signiﬁcant impact, reducing the total number of cases by almost . Our interpretation of these results is that the graph diﬀusion modelspreads tiny portions of the epidemic in each of the seed’s neighbours, thusseeding also these new locations. Then, since the SEIRS parameters guaranteea local escalation of the epidemic, we do observe an increasing number ofcases even a long time after the airport closures. This is in fact a very positivemessage, since in these cases the epidemic escalates at a very slow pace, thuspermitting the introduction of new measure that can control it.Extending on the previous result we consider whether the impact of socialdistancing, mask wearing and other measures which collectively aid in reduc-ing the rate of infection can have a joint eﬀect on our metrics when employedalongside the earlier airport closures. Since we already observed in Figure 8that reducing the rate of infection reduces the speed at which the diseasespreads a joint eﬀect on our metrics seem plausible. After running the sameexperiment as previous, but with diﬀerent values for the infection rate multi-plier ( k = 5 and k = 3 ), we ﬁnd that the reductions in both peak and totalcases due to the earlier airport closures is 4 - 5% larger when the infection rateis lower ( k = 3 ) as seen in Figure. 10. However the joint reductions due toFig. 10: Comparison of k = 5 and k = 3 on relative reduction. * Cell valueis computed as: − max(Infections—Day N Closure)max(Infections—Day 500 Closure) and similarly for total cases/ recoveries ** Interaction Eﬀect = k Row − k Row the combination both measures becomes trivial if we close airports swiftly inDay 1 or Day 2, suggesting that early closures will simply stop the virus fromspreading in the ﬁrst place. Additionally an interaction may still be presentat days greater than Day 5, however it is not useful to consider this as theabsolute reductions in cases by any measure after this period is small.5.2 Threshold Infected RuleA further modiﬁcation to the previous results involves dynamically closingairports whenever the the total number of cases exceeds in every X peoplewithin the local population. This diﬀers from the previous method in whichwe implemented blanket global closures. The results of this new experiment,which are shown in Figure 11, indicate that this ‘wait and see’ strategy istotally ineﬀective under our modelling framework. Even considering a highlyFig. 11: Figure 14 Comparison of k = 5 versus k = 3 on relative reduction.* Cell value is computed as: − max(Infections—1 in 10N threshold)max(Infections—1 in 100 threshold) unrealistic version of this strategy in where we suppose it is possible to detectcases up to a ﬁneness of in every million people, it is already too lateto close airports, providing little more than a reduction in peak infectionsand almost no change in the total number of cases. Again, we motivate thisresult by emphasising that most of the observed cases are due to the escalationof the local epidemic, and are not travel-related.5.3 Limited Nth Day RuleIn the previous section we realise that it is far more eﬀective to close airportspreemptively than it is to wait on some threshold level on infections to beattained within the local population. However, one could argue that it is im-practical to close all airports globally, both from a economic and political pointof view. Thus, we proceed to examine what performance we can achieve by onlyclosing a subset of key airports, which we rank by several metrics: population, Model-Based Approach To Assess Epidemic Risk 19

Page-Rank and betweenness. It is quite interesting to see that even with onlythe top of airports closed we still obtained signiﬁcantly greater reductionsthan the threshold rule (Figure 12a). This further emphasises the point thatearly intervention is far more important in the network than attempting to de-tect when certain infection thresholds have been breached. We also note thatthe fall-oﬀ in the strategy’s performance is quite non linear both across therows or columns. In Figure 12b, we present the same experiment but using thePage-Rank ranking of the airports, and we notice that the reductions in peakinfections are drastically superior to that obtained when ranking the airportsby populations, suggesting that prioritising the closure of airports with highcentrality is very important in terms of epidemic mitigation. Finally, we applythe same methodology using a ranking based on betweenness. In Figure 12c,we see the results are somewhat mixed, with the strategy performing betterthan with previous metrics at the level, and marginally worse at otherlevels.5.4 Optimisation ApproachIn section 4 we explored several mitigation strategies, ultimately ﬁnding great-est ﬂexibility and mitigation in the limited Nth day rule. This strongly suggeststhat we could ﬁnd a very eﬀective airport closure strategy that will achieve ex-cellent mitigation results, without totally disrupting the network connectivity.In this section, we employ a genetic algorithm (GA) to search for the opti-mal combination of airport closures that maximises the utility of the Nth DayRule. GAs, in their simplest form, operate on binary strings called chromo-somes which are an encoding of the parameters of interest, commonly referredto as genes. Any particular instance of a chromosome has a genotype whichrefers to a speciﬁc string of bits each with values 1 or 0 representing a partic-ular gene’s allele. Once the problem can be formulated within this frameworkthe procedure followed by genetic algorithms is the following:1. Deﬁne a ﬁtness function F(X) which evaluates the optimality of a givengenotype.2. Initialise a population of chromosomes with randomly assigned genotypes.3. Evaluate the ﬁtness of all members of the population. Individuals will thenbe selected for breeding at a frequency proportional to their ﬁtness (thisemulates the “survival of the ﬁttest” mechanism).4. In a process known as crossover , pairs of selected genotypes are split uni-formly at some locus along the chromosome and then recombined , to formnew chromosomes.5. Mutations are then applied at random to alleles of the recombined chro-mosomes (simply by bit ﬂipping) in order to prevent irrevocable loss of anycharacteristic.6. The process (3 - 5) is then repeated so that the ﬁttest member of thepopulation are selected. population .(b) Relative reductions: airports sorted by

Page-Rank .(c) Relative reductions: airports sorted by betweeness . Fig. 12: Relative reductions on peak infections and total cases using diﬀerentpercentages of closures and ranking factors.

Model-Based Approach To Assess Epidemic Risk 21

For a more detailed explanation of the entire process see [18]. We opt to usea genetic algorithm for this problem since the solutions of our combinatorialproblem can be easily represented as a binary strings. By contrast, problemsof this nature are ill suited to classical optimisation methods such as GradientDescent, as it is not possible to analytically compute gradients and our searchspace is too large for approximation methods. Additionally a ﬁtness functioncan be easily deﬁned from the metrics we have described previously. In theTable 13 we identify the key information required to formulate our problemwithin the genetic framework.Fig. 13: Deﬁning key components of our problem in the GA framework.In order to evaluate the ﬁtness function, we must compute the values of T and P (computing A is trivial). This clearly involves inputting the parametersencoded in the chromosomes genotype into our simulation developed in previ-ous sections. We perform this by ﬁrst using a lookup table to convert betweenthe 195 bit country closures string speciﬁed in the genotype to a , bitstring airport closures vector required by our model. Next we set to zero the rows and columns of the closed airports within the adjacency matrix at theappropriate time steps (to disconnect an airport from the network). Finallywe proceed to run the algorithm for suﬃcient iterations as for the ﬁrst waveof the epidemic to be completed.5.5 Optimisation Results and ComparisonsThe results of the algorithm in Figure 14 are presented in the usual formatfor consistency, however each row now represents a diﬀerent GA optimisedfor metrics on of days to respectively, with the corresponding quantitiesoptimised bolded in the table.Fig. 14: Relative reductions on day closure scenario for percentages ofairport closures. (airports sorted by Genetic Algorithm ), bolded numberindicate is the day for which the row was optimised by GA.The results are quite remarkable, but better visualised in Figure 15 for theday closure scenario. Not only does the GA outperform our previous rankingFig. 15: Day 3 comparison of strategy performancemethods (with an equivalent % of airport closures), it also improves on the Model-Based Approach To Assess Epidemic Risk 23 close all airports strategy by − . Whilst this is counter-intuitive, the GAleverages the hidden structure within the network to ﬂatten the curve over days earlier, while also reducing peak and total infections when compared withPage-Rank, all-closed strategy and unmitigated strategy, as seen in Figure 16.Fig. 16: Day 3 comparison of strategy performanceThis suggests that the GA learns to leverage the network structure viaclosures in such a way as to accelerate the initial infection rate, but achievinga lower point of equilibrium. To gain further intuition into the GA behaviour, itis best to look at Figure 17 which exhibits the evolution of the GA strategy aswe alter the day at which closures occur. What we observe is that it is initiallyoptimal to close China and certain other countries such as France which arevery well connected to China via air routes. However as governments delayclosures up to day , we ﬁnd that the GA shifts focus away from China andstarts to establish “ﬁrebreaks” in other surrounding countries such as India,Kazakstan and Russia, whilst also selective targeting certain several Africanand South American countries.Returning our focus back to the day 3 strategy learned by the GA in Figure18, we examine the percentage change in peak infections and total cases underthe GA strategy compared with the all-closed strategy and unmitigated strat-egy. Despite only of airports being closed under the GA strategy of countries see a reduction in peak infections relative to an unmitigated case,whilst see a reduction in peak infections relative to the all closed case.Similarly, we see that of countries see a reduction in total cases relativeto the unmitigated case and see a reduction in total cases relative to theall-closed scenario. This provides overwhelming evidence of the eﬀectiveness ofthe genetic algorithm strategy over naive propositions of worldwide closures ofairports, and highlights how a model-based approach to solving such a problemcan provide signiﬁcant added value over simplistic intuitive strategies, such asclosing by population or network centrality. Fig. 17: GA: the optimal countries to close starting from nth day (dark blueindicates closed). (a) GA relative to all-closed. (b) GA relative to unmitigated.

Fig. 18: A comparison of GA strategy versus naive strategies and total inaction.

Model-Based Approach To Assess Epidemic Risk 25

In this paper, we have shown that human mobility infrastructure networksare highly complex and highly resilient to node removals. This has importantconsequences in epidemiology, since the high connectivity facilitates the spreadof pandemic diseases to a worldwide scale. These observations motivated us toseek and test non-trivial airport closure strategies that could maintain a goodnetwork connectivity, while slowing down the spread of a potential disease.Our framework is based on a metapopulation SEIRS model, where the sub-populations are located in the airports’ locations, and they are composed of theindividuals living in those nearby areas. Our simulations allowed us to exploreand study a variety of realistic scenarios to understand the dynamics of thespread of the disease. We considered several diﬀerent airport closure strategies,and we compared them using measures extracted from our simulations.One main message that arises from our results is that the disease is seededin many locations worlwide with impressive speed. If conditions are met forisolated local escalations of the epidemic, then most countries will be hit bythe seeded disease after a variable delay, regardless of any late interventionson travelling restrictions.Our ﬁndings suggest that the ﬁrst week of dispersion of the disease throughthe network is a critical time period for eﬀective intervention, however inter-ventions in the network, such as airport closures, still provide some reductionsto peak infections and total cases up to 3 months into the simulation. Further-more, we show that policies which reduce community spread can be combinedwith our proposed airport closure strategies to provide greater beneﬁts thanif either policy had been used separately.Finally, we explored the application of an optimisation approach to identifyoptimal airport closures within the critical ﬁrst week of disease spread, inorder to reduce the global impact of the epidemic while keeping as manyairports open as possible. This optimisation approach, based on a geneticalgorithm, improved upon all of the other methods, hence providing a newmodel-based perspective on the decision making process that leads to the travelrestrictions. One very interesting aspect of our results is that the algorithmleverages the complex structure of the network to place strategic “ﬁre breaks”,which drastically reduce peak infections and total cases. ψ j at nodes j = 1 . . . N .Thus ﬂuid ﬂows from node i to node j at a rate proportional to the diﬀerencein the amount of ﬂuid at each node c ( ψ i − ψ j ) where c is the constant ofproportionality or more commonly referred to as the diﬀusion constant . Wenote however that ﬂuid can only ﬂow between nodes if they are adjacent ingraph G with adjacency matrix A . Thus the total instantaneous change inﬂuid volume at node j is given by the following equation. dψ j dt = c N (cid:88) i =1 ( ψ i − ψ j ) A ij Following from thus we can easily proceed to vectorise this equation for allvertices as follows, where ψ = ( ψ ... ψ N ) T and D = diag ( deg ( v ) , . . . , deg ( v N )) . dψ j dt = c N (cid:88) i =1 ψ i A ij − cψ j deg ( v j ) dψdt = c ( A − D ) ψdψdt = − c ( D − A ) ψ In the basic implementation described above we consider there to be onlya single ﬂuid, however we will require 4 diﬀerent ﬂuids for the 4 states ofour SEIRS model. Thus we let θ j = ( S j E j I j R j ) T represent the number ofpeople in the airport / node population which are in the Susceptible, Exposed,Infectious and Recovered states and ψ j = α j θ j be the proportion of peopleavailable for travel. Hence ψ ∈ R x N is now a matrix and thus we mustmodify the ordering of our equation to correct for this. dψdt = − cψ ( D − A T ) ∈ R x N Breaking this down, the components of the original deﬁnition are reconﬁg-ured such that (cid:80) Ni =1 ψ i A ij becomes ψA T = ( (cid:80) Nj =1 A j ψ j ... (cid:80) Nj =1 A Nj ψ j ) ∈ R x N and ψ j deg ( v j ) becomes ψD ∈ R x N . As a ﬁnal simpliﬁcation we as-sume that A = A T which is appropriate in the case of ﬂight routes which arealmost always operated in both directions. dψdt = − cψ ( D − A ) ∈ R x N This equation provides a simple model for the diﬀusion / travel of peoplethrough the international ﬂight network. Thus we can now deﬁne a simple

Model-Based Approach To Assess Epidemic Risk 27 update algorithm which enables travel through the network and updating ofthe sizes of the local populations.1. Repeat for t = 1 . . . T (a) Compute ψ = θα where α = diag ( α , . . . , α N ) .(b) Update the local populations due international travel θ = θ + dψdt .However there are some obvious issues with this simplistic model, namelythat θ i typically converge to some equilibrium value θ (0) (cid:54) = θ ( eq ) , whichdoes not represent the reality that people travelling on holidays or businesswill typically return to their home country. Secondly there are some technicalissues which arise from implementation, such as for vertices of large degree simultaneous outﬂows may exceed the total population at the node (this is anartefact of vectorisation). Finally the ﬂows to other vertices are not inﬂuencedby how central the airport is to the network, which is obviously important astravellers are more likely to visit major tourist / business destinations and thisinformation is not necessarily reﬂected in the population (eg. Ethiopia has avery large population but it is not as important in the network as France orthe UK).The ﬁrst of these 3 issues is quite easy to address, we can simply split thepopulation θ into a base population θ B who live permanently in the localarea and θ T being the transient population in a given node on holidays /business. Refactoring our algorithm to incorporate this change is relativelysimple.1. Repeat for t = 1 . . . T (a) Compute ψ + = θ B α + and ψ − = θ T α − where ψ + represents out-bound travellers from the local base population and ψ − represents re-turning transient travellers to their home countries.(b) We then compute dψ + dt = − c + ψ + ( D − A ) and dψ + dt = − c − ψ + ( D − A ) respectively(c) Finally we now have two update equations for international traveli. θ B = θ B + min( dψ + dt ,

0) + max( dψ − dt , where we subtract the outbound base populations and add the arriving travellers.ii. θ T = θ T + max( dψ + dt ,

0) + min( dψ − dt , , where we add the returning base populations and subtract the departing travellers.Note that we have chosen the wording outbound , arriving , depart-ing , returning carefully to reﬂect the sequence of steps all travellerspass through in order. This is crucial to ensure logical behaviourof travellers and also to prevent leakage of ﬂuid / people from thenetwork.To address the second issue we can simply divide ψ i by the degree of thenode which ensures that the outﬂows computed to every other node will be atmost ψ i . This modiﬁcation we will denote by ψ i D but it is understood that thisrepresents ψ i D ii element wise and will be implemented by Numpy broadcasting operation in Python.To Address the ﬁnal issue we replace the adjacency matrix A with aweighted matrix C to encourage ﬂows to more central airports, where C isdeﬁned as follows, where P j is the page-rank centrality of node v j . A ij P j (cid:80) Nj =1 A ij P j We then translate this back into our original formulation, with A replacedby C and D replaced by I , the identity matrix (it is trivial to check by a similarderivation to previously that since C is row stochastic D may be replaced bythe identity matrix). Similarly this generalises to the more complex modelsaddressing the other issues with the original graph diﬀusion model. dψdt = − cψ ( I − C ) ∈ R x N Next we addition another the SEIRS model which models the changes in θ due to community spread of the disease (rather than due to internationaltravel). As in Section 3.1 we deﬁne the following diﬀerential equations whichdeﬁne the SEIRS model. dSdt = δR − SβIMdEdt = SβIM − (cid:15)EdIdt = (cid:15)E − γIdRdt = γI − δR These changes represent the spread of disease within a single community,but for eﬃciency we will deﬁne an operator B which acts on a modiﬁed ˜ θ which enables application of the linear transformation to obtain the updatedcommunity state. B =  − β δβ − (cid:15) (cid:15) − γ

00 0 γ − δ  ˜ θ j = ( SjIjMj ,E j ,I j ,R j ) Note that the community spread occurs prior to the splitting of the pop-ulation theta into θ B and θ T as it is assumed that travellers and the localcommunity are fully mixed. Thus the ﬁnal implementation of the algorithm EFERENCES 29 may be described in table 4, where we include one ﬁnal step of splitting θ into θ B and θ T according to the prior populations (cid:107) θ B (cid:107) and (cid:107) θ T (cid:107) respectively.Table 4: Simulation Pseudocode. References [1] “A-Model-Based-Approach-To-Assess-Epidemic-Risk”. In: (Aug. 2020). url : https://github.com/hugo1005/A-Model-Based-Approach-To-Assess-Epidemic-Risk .[2] A. Apolloni et al. “Metapopulation epidemic models with heterogeneousmixing and travel behaviour”. In: Theoretical Biology and Medical Mod-elling

Journal ofcomputational science

Proceedings of the National Academy ofSciences

Science

Statistics and computing url : https://sedac.ciesin.columbia.edu/data/set/gpw- v4- admin-unit-center-points-population-estimates-rev11 .[8] H. W. Hethcote. “The mathematics of infectious diseases”. In: SIAMreview

Journal of medical virology (2020). [10] L. L´opez and X. Rodo. “A modiﬁed SEIR model to predict the COVID-19 outbreak in Spain and Italy: simulating control scenarios and multi-scale epidemics”. In:

Available at SSRN 3576802 (2020).[11] Mark Newman.

Networks . OUP Oxford, Mar. 2010. doi : .[12] K. Nowicki and T. A. B. Snijders. “Estimation and prediction for stochas-tic blockstructures”. In: Journal of the American statistical association url : https://openflights.org/data.html .[14] Tiago P. Peixoto. “Eﬃcient Monte Carlo and greedy heuristic for theinference of stochastic block models”. In: Physical Review E issn : 1550-2376. doi : . url : http://dx.doi.org/10.1103/PhysRevE.89.012804 .[15] L. Peng et al. “Epidemic analysis of COVID-19 in China by dynamicalmodeling”. In: arXiv preprint arXiv:2002.06563 (2020).[16] A. Radulescu and K. Cavanagh. “Management strategies in a SEIRmodel of COVID 19 community spread”. In: arXiv preprint arXiv:2003.11150 (2020).[17] Muhammad Adnan Shereen et al. “COVID-19 infection: Origin, trans-mission, and characteristics of human coronaviruses”. In: Journal of Ad-vanced Research (2020).[18] J. C. Spall.

Introduction to stochastic search and optimization: estima-tion, simulation, and control . Vol. 65. John Wiley & Sons, 2005.[19] M. Tizzoni et al. “On the use of human mobility proxies for modelingepidemics”. In:

PLoS Comput Biol