[PDF] A new formulation of compartmental epidemic modelling for arbitrary distributions of incubation and removal times

Abstract

The paradigm for compartment models in epidemiology assumes exponentially distributed incubation and removal times, which is not realistic in actual populations. Commonly used variations with multiple exponentially distributed variables are more flexible, yet do not allow for arbitrary distributions. We present a new formulation, focussing on the SEIR concept that allows to include general distributions of incubation and removal times. We compare the solution to two types of agent-based model simulations, a spatially homogeneous one where infection occurs by proximity, and a model on a scale-free network with varying clustering properties, where the infection between any two agents occurs via their link if it exists. We find good agreement in both cases. Furthermore a family of asymptotic solutions of the equations is found in terms of a logistic curve, which after a non-universal time shift, fits extremely well all the microdynamical simulations. The formulation allows for a simple numerical approach; software in Julia and Python is provided.

Full PDF

AA simple formulation of non-Markovian SEIR

P. Hern´andez , C. Pena , A. Ramos and J.J. G´omez-Cadenas Departamento de F´ısica Te´orica e Instituto de F´ısica Corpuscular CSIC-UVEG,Universidad de Valencia E-46071 Valencia, Spain Departamento de F´ısica Te´orica and Instituto de F´ısica Te´orica UAM-CSIC, UniversidadAut´onoma de Madrid, E-28049 Madrid, Spain School of Mathematics & Hamilton Mathematics Institute, Trinity College Dublin,Dublin 2, Ireland Donostia International Physics Center (DIPC) and Ikerbasque, Manuel LardizabalIbilbidea 4, 20018 San Sebasti´an Donostia, Spain

May 21, 2020

Abstract

We present a formulation of the SEIR concept that represents the evolu-tion of an epidemic under the assumption of homogeneity and uniformity ofthe microscopic parameters: rate of infection, incubation and recovery/removaltimes. Non-homogeneities in the basic parameters are easily included by inte-gration for arbitrary distributions of incubation and removal times, recoveringthe classical SEIR model under the assumption of exponential distributions. Wecompare the solution to these equations, uSEIR, to two types of agent-basedmodel simulations, a spatially homogeneous one where infection occurs by prox-imity, and a model on a scale-free network with varying clustering properties,where the infection between any two agents occurs via their link if it exists.We ﬁnd good agreement in both cases with the uSEIR solution. Furthermorea family of asymptotic solutions of the uSEIR equations is found in terms ofa logistic curve, which after a non-universal time shift, ﬁts extremely well allthe microdynamical simulations. A link to software packages for the numericalsolution of the uSEIR equations is provided.

A burgeoning number of papers attempting to model the dynamics of the COVID-19 pandemic have been published over the last few months. Among these, a large See for example, the CMMID repository, https://cmmid.github.io/topics/covid19/ a r X i v : . [ q - b i o . P E ] M a y raction ([1, 2, 3] are just a few examples) are based in more or less complex variantsof the classical SEIR (susceptible - infected - recovered/removed) model [4], whichassume exponential distributions of incubation and removal times. Real epidemicdata do not support however an exponential distribution for these parameters whichare usually described by gamma, lognormal or Weibull distributions [5, 6].It is well-known that these more general distributions are diﬃcult to be cap-tured by classical compartmental models such as SEIR, which normally treat theincubation and removal times as exponentially distributed, or as the sum of severalexponentially-distributed independent times. On the other hand, there is a vastliterature on the study of non-Markovian stochastic processes to model epidemicswith arbitrary distributions for incubation and removal times. However, we havenot found a simple and concrete formulation of compartmental models for epidemicsthat implements general distributions. The goal of this paper is to provide such aformulation, which is also practical from a numerical point of view.The starting point is a formulation of the SEIR concept that correctly repre-sents the evolution of an epidemic under the assumption of full homogeneity anduniformity of the microscopic parameters, namely: i) number of contacts per unittime, ii) the probability of infection per contact, iii) the incubation time and iv) therecovery/removal time. We will refer to this formulation as uSEIR. We demonstratethat the results of uSEIR describe very well the result of a microscopic Agent BasedModel (ABM) simulation with no stochasticity in the model parameters.Arbitrary distribution of the incubation and removal times can be easily incor-porated into our equations, recovering the classical SEIR equations in the particularcase of exponentially-distributed incubation and removal times. We also show thatthe resulting equations can be eﬃciently solved numerically, and provide appropri-ate codes and examples (with implementations in the Julia and Python/Cythonlanguages).The non-homogenity in the infection rate per unit time is more subtle, because,in the extreme case, it should invalidate the treatment in terms of global S-E-I-Rpopulations. We study two sources of this non-uniformity: inhomegeneity in theprobability of infection per contact and inhomogeneity in the number of contactsper individual. The ﬁrst is modelled with an ABM model with a negative binomialdistribution of the probability of infection per contact, the second is modelled witha simulation on a scale-free network. The uSEIR equations represent instead theevolution in which the infection rate per unit time is the average one, independentlyof the underlying distribution. We observe that the simulations show a signiﬁcantvariance which however amounts mostly to a time-translation. When the diﬀerentcurves of infected individuals are shifted to tune their maxima, all the curves fallon a universal curve correctly reproduced by the uSEIR equations. We analyse theorigin of this universality, i.e., independence on initial conditions. It derives from See, e.g., [7] for a review.

2n asymptotic solution of the uSEIR equations, which is found to be a logistic curvewhose shape is ﬁxed by the average microscopic parameters. In the case of networkswe brieﬂy comment on the eﬀect of clustering on the dynamics of the epidemic.The structure of the paper is as follows. In section 2 we brieﬂy describe theagent-based simulations that we will use as benchmark. In sec. 3 we present theuSEIR formulation and study the solutions for uniform microscopic parameters. Insection 4 we incorporate non-uniformity in the incubation and removal times. Insection 5 we discuss the non-uniformity in the probability of infection, and discussthe origin of the observed universality principle. In section 6 we consider propagationon scale-free networks. We conclude in section 7.

The spread of an infectious agent in a large population is a complex stochasticprocess, which under certain assumptions can be described in terms of a relativelysmall number of global variables, which follow deterministic diﬀerential equations.In order to understand the underlying dynamics, it is useful to think of an epidemicoutbreak in terms of simple agent-based models (ABMs) [8], where the micrody-namics can be studied by computer simulations. In these models agents can maketheir own decisions based on the rules given to them, and the evolution can captureunexpected aggregate phenomena that result from combined individual behaviours.ABMs can incorporate easily stochastic parameters as well as heterogeneities in thepopulation, and they are therefore a useful tool to study the performance of thedescription in terms of global variables.In the context of an epidemic, agents have four possible states: Susceptible(S), Exposed (E), Infectious(I) and Removed (R). Only infectious agents can inducethe change of state of another susceptible agent to that of exposed with a giveninfection rate, r S → E . Each exposed agent necessarily becomes infectious after anincubation time, t i , while each infectious agent remains in this state only during therecovery/removal time interval, t r . In a real epidemic this time would be the intervalof time during which an individual remains infectious. The agent can move to theremoved compartment either because it dies, recovers or gets isolated. All theseoutcomes are equivalent as regards the evolution of the epidemic, which is monitoredby the total fraction of agents in states S ( t ) , E ( t ) , I ( t ) , R ( t ) at any given time. Thestudy of an epidemic in terms of these variables is the SEIR paradigm [9]. In thiscontext, the so-called basic reproduction number, R , is a fundamental quantity thatcontrols the rate of infection in an homogeneous susceptible population. It is deﬁnedas the average number of individuals that a given infectious agent turn to exposedin the interval t r , assuming a fully homogeneous and susceptible population.In this paper we want to derive a set of equations for these global variablesthat describe correctly the global dynamics under the necessary assumption of a3ell-mixed and homogeneous population, if seen at a suﬃciently large scale, butthat incorporates arbitrary distributions of incubation, removal and infection ratesin the microdynamics. These set of equations will be presented in the next section,where they will be benchmarked against two types of ABM simulations, that webrieﬂy describe next. In the ﬁrst type of ABMs, a number N of agents progressingthrough the S-E-I-R compartments move in an homogeneous space and get exposedby proximity to infected neighbours with some probability. The second type ofmodels assumes the evolution on a network where the agents have a varying numberof contacts. We use the MESA package to simulate the spread of an outbreak in a homogeneouspopulation. The agents in the model are called ”turtles”, following the nomenclatureof the NetLogo software. A two-dimensional turtle world is divided in a grid ofequal size cells, and inhabited by N turtles. At the initial time the turtles occupy arandomly chosen cell and at each clock tick they do a random move to a neighbouringcells or stay in the same one.To simulate the evolution of an epidemics, a small number of turtles are in-fectious , I , at the initial time, while S = N − I are susceptible. At each clocktick infectious (I) turtles can expose any susceptible (S) turtle they ﬁnd in theirneighbourhood. Two turtles are considered neighbours if they share the same cellor are in neighbouring ones.More concretely, at each time step each susceptible neighbour of an infectedturtle, k is exposed with probability p . If the neighbour k becomes exposed, theclock time of this event is recorded, t ( k ) s → e . At each tick all the exposed turtlesare examined and eventually turned into infectious, at time t − t ( k ) s → e > t i . Again,the time at which the transition to infectious happens is recorded, t ( k ) e → i , and theturtle remains infectious until it progresses to the recovered compartment of time t − t ( k ) e → i > t r The basic assumption of homogeneity at large scales or full-mixing of the S-E-I-R populations requires that the probability of infection per contact is smallenough. Only in the limit p → R = c × p × t r , (2.1)where c is the average number of neighbours or contacts per unit time. For the rulesdescribed before, and in the case of a turtle world homogeneous population, c is https://github.com/projectmesa/mesa http://ccl.northwestern.edu/netlogo/index.shtml c = (cid:18) NA − (cid:19) (cid:15) − , (2.2)where A is the total number of grid cells in the turtle world and 9 is the number ofneighbouring cells of any given cell (including itself). (cid:15) is the measure of one timestep. Therefore, the infection probability can be obtained from any desired R as: p = R c · t r . (2.3)The simulation with this p will only match the input value of R in the simulation if p is small enough, so that the assumptions that go in the derivation of this formulaare satisﬁed. The simulation is run for a time t (cid:29) t r and we use a step such that t r /(cid:15) ∼ allow to model some inhomogeneity in the population.In this case, the evolution of the disease is described by a non-Markovian SEIRmodel.At each step the software records the turtles in each compartment and thusprovides (in the limit of small clock ticks) the functions S ( t ) , E ( t ) , I ( t ) and R ( t ),which can be directly compared with the predictions of the solutions of the SEIRmodels. Realistic populations are not necessarily well-mixed, at least not at small scales.Most individuals have contact only with a very small fraction of the total population.Complex networks show very rich topological features that are similar to real-worldsocial networks. They can have a small number of links between nodes and stilldisplay the small-world phenomena. Scale-free networks can also capture the largediﬀerence of contacts that diﬀerent individuals in society have. The study of theevolution of diseases on complex networks allows to study the impact of this richtopological structure in the evolution of and epidemic. The spread of epidemics onnetworks is an area of intense research. Since the seminal works [10] many studieshave been performed on this topic (see the recent review [11] and references therein).A network is just a non-oriented graph { G, E } consisting on nodes G = { n i } Ni =1 and edges linking two nodes, E = { e ij } . We say that two nodes n a and n b are See Appendix A for details and links to public repositories for the code. e ab ∈ E . The number of edges attached to a node n a is called its degreeand labelled k a .In the context of the spread of an infectious disease, each node is an agent, andthe edges represent the contacts. Each contact links two agents that can expose eachother if one is infectious and the other susceptible. The number of edges is thereforethe number of contacts. At each tick of time, (cid:15) (cid:28) t r , infectious nodes pick a singleedge at random, and if susceptible they attempt to infect the node attached to itwith probability p . In this setup , each click of time represents therefore a contactand all nodes have the same number of contacts per unit time c = 1. As in theturtle world, the infectious agent remains so in a time interval t r , while the times t r , t i can be chosen diﬀerent for each node by drawing samples of some previouslychosen distribution.In contrast with the turtle world, a general network breaks the assumptionof full mixing, since two nodes that are not linked have exactly zero probability oftransmiting the disease between them. On the other hand, a fully connected network,where each node is linked to the remaining N − R is simply R = pt r . (2.4)For a general network with small clustering, the correction to this relation is expectedto scale as ∝ / (cid:104) k (cid:105) as shown in Fig. 1). More generally, the value of R can dependin a non-trivial way on the network topological properties.In our study we will concentrate on a particular one-parameter family of randomnetworks described by Klemm and Eguiluz (KE) [12]. These complex networks showa number of features that are expected in realistic networks Scale-free

Nodes with both large and small number of contacts are present. In factthe distribution of the number of nodes is given by the power law P ( k ) = (cid:104) k (cid:105) k , k > (cid:104) k (cid:105) . (2.5) Small-world

Most nodes are not linked between themselves (i.e. (cid:104) k (cid:105) (cid:28) N ), butevery link can be reached from any other by a small number of hops. Beingmore speciﬁc, the average distance between nodes gros logarithmically withthe size of the network (cid:104) d (cid:105) ∼ log N . High clustering

Even if two networks share the number of nodes, edges and thedegree distribution, they can look very diﬀerent if the average clustering co-eﬃcient, (cid:104) C (cid:105) , is diﬀerent. The clustering coeﬃcient c i of a node n i measures More general situations can be simulated by allowing infectious nodes to attempt infectingseveral nodes at each time step. . . . . . . . . .

91 0 0 . . . . . . . . . R / ( p × t r ) R /k p × t r = 2 p × t r = 8 Figure 1: R / ( p × t r ) as a function of R /k , for a network where each link is randomlylinked to k other nodes, for two values of pt r = 2 , (cid:104) k (cid:105) = 9 .

93 anddiﬀerent clustering: (cid:104) C i (cid:105) = 0 . µ = 0 .

1) (left) and (cid:104) C i (cid:105) = 0 .

07 ( µ = 0 .

9) (right)the probability that two neighbors of n i are also neighbours C i ≡ |{ e jk \ e ji , e ki , e jk ∈ E }| k i ( k i − . (2.6)In Figs. 2 we show two networks with equal distribution of k that diﬀer onlyin the diﬀerent clustering properties.7E networks depend on a free parameter µ that does not aﬀect the averagedegree of the network or its distribution, but aﬀects severely the value of theclustering, interpolating from almost no clustering (cid:104) C (cid:105) = 0 for µ →

1, to avery clustered network with (cid:104) C (cid:105) ≈ .

84 for µ = 0. A real epidemic is a complex stochastic process that eventually evolves to a regimewhere there are large numbers of individuals in the S-E-I-R compartments. Inthe assumption that the populations in these compartements are homogeneous andmaximally mixed, the dynamics of the system should be well described in terms ofthe global variables S ( t ) , E ( t ) , I ( t ) , R ( t ), whose time evolution is described by a setof deterministic diﬀerential equations [9, 13].We ﬁrst want to derive the set of equations that should describe the dynamicsof these variables under the assumption that the incubation and removal times areﬁxed. The relation between the changes in these variables is essentially ﬁxed byunitarity. On the one hand, each individual must be in one of the S, E, I or Rcompartments. Therefore the number of individuals in the population, N , is aconstant: S ( t ) + E ( t ) + I ( t ) + R ( t ) = N. (3.1)Secondly, there must also be a relation between the rates at which these diﬀerentindividuals move from one compartment to the next. An infectious process is thatin which an infected individual gets in contact with a susceptible one. Let us call r S → E the rate of infection per unit time per infected individual and per susceptibleindividual. The number of susceptible individuals gets reduced by those that becomeexposed between [ t, t + dt ], that is: dS ( t ) = − r S → E I ( t ) S ( t ) dt. (3.2)This is the basic equation that assumes an homogenous and maximally-mixed sus-ceptible and infectious populations, making the treatment of the microscopic processin terms of global variables possible. It constitutes the simplest possible form of theforce of infection, deﬁned as (minus) the logarithmic derivative of S . Keeping withinthe simplest approximation, if the incubation and removal times of all individualshave the same values, we must also have that the individuals that become exposedat time t are those that move from compartment S → E minus those that movefrom E → I . But the latter must be the ones that entered the exposed compartment8n time t − t i . Therefore we have: dE ( t ) = − dS ( t ) + dS ( t − t i ) θ ( t − t i ) ,dI ( t ) = − dS ( t − t i ) θ ( t − t i ) + dS ( t − t i − t r ) θ ( t − t i − t r ) , (3.3) dR ( t ) = − dS ( t − t i − t r ) θ ( t − t i − t r ) . The initial conditions to these equations start with a ﬁxed N and a number ofinfected individuals at time t = 0, I (0) = I , so that S (0) = S = N − I , while E (0) = 0 and R (0) = 0. In the equations above, the number of initially infectedindividuals does not recover, but we can easily force this with the substitution ineq. (3.2): I ( t ) → ˜ I ( t ) ≡ I ( t ) − I (0) θ ( t − t r ) . (3.4)These equations depend only on three variables, namely r S → E , t i and t r , whichin principle are the same parameters appearing in the classical SEIR models. Interms of the basic reproduction number, R , r S → E corresponds to the combination: r S → E = R N t r . (3.5)Note that R is proportional to t r , while r S → E is independent of t r . In amicroscopic description of the infected process as in the ABM simulations, the rateis related to the microscopic parameters via R from eqs. (2.1) or (2.4) for thediﬀerent ABMs.We can compare the uSEIR and classical SEIR solutions to the ABMs simula-tions, matching the basic microscopic parameters. In Fig. 3 we show the curve forthe fraction of infected individuals as a function of time measured from 10 indepen-dent turtle simulations in a population of 10 agents with a fraction of infectiousagents of 10 − at t = 0, and assuming ﬁxed parameters t i , t r and r S → E for all theagents. The uSEIR solution agrees very well with the simulations, while the classicalSEIR predicts a wider and less pronounced peak.This is of course not surprising, since classical SEIR is known to be valid when t i and t r are exponentially distributed, corresponding to an underlying Markovianstochastic process. Modiﬁcations of SEIR equations adding more compartmentscan be designed to represent Erlang distributions, that for suﬃciently large n arenarrower, but uSEIR gets the uniform limit directly and without these complications.In realistic cases, not all individuals have the same incubation or removal times,and certainly not all individuals have the same number of contacts and probabilityof infection per contact. In the following, we consider the eﬀect of these diﬀerentnon-uniformities. 9 �� ( � ) / � Figure 3: Curve of the infected individuals as a function of time (in days) for theuSEIR (solid-black), minimal SEIR (dashed-red) and 10 agent simulations (cyan) ina population of N = 10 and I (0) = 10 with R = 3 . t i = 5 . t r = 6 . R , t i and t r are in the typical range of those used to describethe current COVID-19 pandemic. t i and t r Non-uniformities can be incorporated in the uSEIR equations by considering dif-ferent compartments of individuals. For example, the population divides into thosewith diﬀerent incubation periods, t ( k ) i , so we have S k ( t ) as the susceptible individualsin the k -th compartment of incubation time. Each compartment follows its usualprogression S k → E k → I k → R k , but the important point to notice is that a givensusceptible individual in compartment k becomes an exposed individual in the samecompartment k , but can get infected from any infectious individual in any othercompartment. If we assume that the capability to infect per unit time is indepen-dent on the compartment, the number of susceptible individuals in compartment k changes as they become exposed according to: dS k ( t ) = − r S → E ˜ I ( t ) S k ( t ) dt. (4.1)while eqs. (3.3) will still be valid for the exposed, infected and recovered in eachcompartment k , taking the incubation period as that corresponding to this com-partment, t ( k ) .Summing over all the compartments, the ﬁrst equation leads to: dS ( t ) = − r S → E ˜ I ( t ) S ( t ) dt, (4.2)10hile in the others we get dE ( t ) = − dS ( t ) + (cid:88) k dS ( t − t ( k ) i ) θ ( t − t ( k ) i ) ,dI ( t ) = (cid:88) k (cid:110) − dS ( t − t ( k ) i ) θ ( t − t ( k ) i ) + dS ( t − t ( k ) i − t r ) θ ( t − t ( k ) i − t r ) (cid:111) ,dR ( t ) = (cid:88) k (cid:110) − dS ( t − t ( k ) i − t r ) θ ( t − t ( k ) i − t r ) (cid:111) . (4.3)Obviously in the limit of t ( k ) i varying continuously the sum becomes an integral withthe corresponding PDF, P E ( t i ): (cid:88) k ( ... ) → (cid:90) dt i P E ( t i )( ... ) , (cid:90) ∞ dt i P E ( t i ) = 1 . (4.4)We can similarly assume sub-compartments for varying t r , with a PDF P I ( t r ),and the modiﬁcation would be analogous, resulting in the following delay integro-diﬀerential equations: dS ( t ) dt = − r S → E (cid:90) dt r P I ( t r ) ˜ I ( t ) S ( t ) ,dE ( t ) dt = − S (cid:48) ( t ) + (cid:90) t dt i S (cid:48) ( t − t i ) P E ( t i ) ,dI ( t ) dt = − (cid:90) t dt i S (cid:48) ( t − t i ) P E ( t i ) + (cid:90) t dt i (cid:90) t − t i dt r S (cid:48) ( t − t i − t r ) P E ( t i ) P I ( t r ) ,dR ( t ) dt = − (cid:90) t dt i (cid:90) t − t i dt r S (cid:48) ( t − t i − t r ) P E ( t i ) P I ( t r ) . (4.5)We refer to eqs. (4.5) and (3.3) indistinctively as uSEIR.A simple and eﬃcient algorithm to solve these equations, complete with easy-to-use codes in Python and Julia, is described in Appendix A. In the case where the probabilities are exponential, the integro-diﬀerential equationscan be reduced to regular diﬀerential ones, of the classical SEIR type.Let us assume P E ( t i ) = 1 (cid:104) t i (cid:105) e − t i / (cid:104) t i (cid:105) , (4.6)and deﬁne f ( t ) ≡ (cid:90) t dt i P E ( t i ) S (cid:48) ( t − t i ) = (cid:90) t dzP E ( t − z ) S (cid:48) ( z ) . (4.7)11he derivative of this function is related to that of E ( t ), using eq. (4.3), f (cid:48) ( t ) = − (cid:104) t i (cid:105) dEdt ( t ) , (4.8)so up to a constant f ( t ) = − E ( t ) (cid:104) t i (cid:105) + C. (4.9)Since f (0) = E (0) = 0, the constant must vanish and the equations reduce to: dSdt = − r S → E (cid:90) dt r P I ( t r ) ˜ I ( t ) S ( t ) ,dEdt = − dSdt − (cid:104) t i (cid:105) E ( t ) ,dIdt = 1 (cid:104) t i (cid:105) E ( t ) − (cid:104) t i (cid:105) (cid:90) t P I ( t r ) E ( t − t r ) ,dRdt = 1 (cid:104) t i (cid:105) (cid:90) t P I ( t r ) E ( t − t r ) . (4.10)Analogously we deﬁne g ( t ) ≡ (cid:104) t i (cid:105) (cid:90) t dt r P I ( t r ) E ( t − t r ) , (4.11)which for an exponential with average (cid:104) t r (cid:105) satisﬁes g (cid:48) ( t ) = I (cid:48) ( t ) (cid:104) t r (cid:105) , (4.12)and therefore g ( t ) = I ( t ) (cid:104) t r (cid:105) + C (cid:48) , (4.13)where C (cid:48) = − I (0) / (cid:104) t r (cid:105) . Finally, the integral in the ﬁrst equation: (cid:90) dt r P I ( t r ) ˜ I ( t ) = I ( t ) − I (0) (cid:16) − e − t/ (cid:104) t r (cid:105) (cid:17) ≡ ¯ I ( t ) . (4.14)Finally, deﬁning ¯ R ( t ) ≡ R ( t ) + I (0) (cid:16) − e − t/ (cid:104) t r (cid:105) (cid:17) , (4.15)12e recover the classical SEIR equations: dSdt = − r S → E ¯ I ( t ) S ( t ) ,dEdt = − dSdt − (cid:104) t i (cid:105) E ( t ) ,d ¯ Idt = 1 (cid:104) t i (cid:105) E ( t ) − (cid:104) t r (cid:105) ¯ I ( t ) ,d ¯ Rdt = 1 (cid:104) t r (cid:105) ¯ I ( t ) . (4.16)We can incorporate easily the exponential distributions for t i and t r in theABMs simulations, while we maintain the rate of infection constant. The comparisonof the SEIR solution of eqs. (4.16) and the homogeneous ABM simulation withexponentially distributed t i and t r is shown in Fig. 4. The agreement as expected isgood, even if the variance is much larger than in the ﬁxed-parameters case. In fact,an interesting observation is that most of the observed variance of the outbreaksis a simple time translation. If we time-translate all the outbreaks to make theirmaxima coincide the variance is much smaller and the agreement with SEIR better,see Fig. 5. We will discuss the origin of this time-shift in the following section. � �� ( � ) / � Figure 4: Evolution of the fraction of infected individuals as a function of time(in days) in classical SEIR (solid black) of eqs. (4.16), and in 100 random turtlesimulations with exponentially-distributed t i and t r (cyan) in a population of N =10 and I (0) = 10 with R = 3 . (cid:104) t i (cid:105) = 5 . (cid:104) t r (cid:105) = 6 . k, θ ]. For the COVID-19 epidemic, the distribution of incubation times has13 �� ( � ) / � Figure 5: As in Fig. 4, after a time-shift of the simulation curves so that theirmaxima coincide with that of the SEIR solution.been shown to be well described with a gamma with parameters ( k, θ ) (cid:39) (5 . , . (cid:104) t i (cid:105) (cid:39) . . , t i and t r in stochastic approaches to the propagation of epidemics(see, e.g., a recent review in [7]), we have not found a simple formulation of the prob-lem in the context of compartmental models such as the one described by equationseqs. (4.5). While generalizations of exponential distributions, such as the Erlangcase, are dealt with in the standard SEIR literature using a superﬁcially similarsub-compartmentation approach, this is not quite as general as the treatment de-scribed here. A diﬀerent situation is when the rate of infection is non-uniform across the popula-tion. It is important to stress that the rate depends on two independent parameters:the number of contacts per infected individual, which critically depends on the clus- See, e.g., [14, 15, 16, 17, 18] �� ( � ) / � Figure 6: Fraction of infected individuals as a function of time (in days) fromthe average of 100 turtle simulations with t i and t r distributed in the populationaccording to the gamma distributions (cyan) compared to the solution of the uSEIRof eqs. (4.5) (solid black) and classical SEIR (red dashed). The simulations havebeen time-shifted so that their maxima coincide. The simulation has parameters inboth cases N = 10 and I (0) = 10 with R = 3 . (cid:104) t i (cid:105) = 5 . (cid:104) t r (cid:105) = 6 . We could separate the population in individuals that infect others with diﬀerentrates. The rate might depend on the type of infectious individual and the type ofsusceptible individual. Deﬁning r lk to be the rate at which an infected individual oftype l infects a susceptible individual of type k . The equations in this case are: dS k ( t ) = − (cid:88) l r lk I l ( t ) S k ( t ) dt,dE k ( t ) = − dS k ( t ) + dS k ( t − t i ) θ ( t − t i ) ,dI k ( t ) = − dS k ( t − t i ) θ ( t − t i ) + dS k ( t − t i − t r ) θ ( t − t i − t r ) ,dR k ( t ) = − dS k ( t − t i − t r ) θ ( t − t i − t r ) . where t i and t r might also depend on the compartment.15ssuming that the rates only depend on the type of infecting individual and noton the type of susceptible and, for simplicity t i and t r are uniform, only the totalnumber of individuals in each compartment needs to be evolved. This is the case,because the diﬀerent compartments are in some proportion in the population and weassume the proportion is preserved by the initial conditions of the I k (0) and S k (0).The equations reduce to the usual ones with a rate that is the weighted average: r eﬀ = (cid:88) k r k p k , (5.1)where p k is the proportion of individuals in compartment k . In the continuous case r eﬀ = (cid:90) dr rP R ( r ) , (5.2)where P R ( r ) is the corresponding PDF.However, this result seems in conﬂict with the fact non-uniformity in the rate isknown to be very important in the evolution of an epidemic . One example of thisis the relevance of the fraction of individuals for which the probability of infection iszero, which in practice makes them immune. Their presence in a given populationimplies that the eﬀective number of useful contacts gets reduced. The fraction of thepopulation with zero infecting power is equivalent to the immune fraction. Whenthis fraction is large enough the epidemic may be aborted. This is the concept ofherd immunity, used to measure the needed number of vaccinations to abort anepidemic. A very rough estimate for the fraction of herd immunity, f H , would be R (1 − f H ) = 1 , f H = 1 − /R . (5.3)For COVID-19, with R ∼ f H ∼ .

7, that is 70% of the population. One wouldthen naively expect that in an epidemic where this estimate holds about 70% ofthe population ends up getting infected; however, in the previous examples a largerfraction is found. This is because, due to the time delay in the process, the fractionof recovered individuals grows slowly and is not eﬀective in reducing the growth ofthe epidemic suﬃciently, as would be the case if the fraction of immune individualshad been present from the start, as would be the case, for instance, in a (partially)vaccinated population. Note that in the SEIR paradigm, the immune population ispart of the susceptible, that pass by the compartments E → I → R but have zeroinfecting power when they are I so they are inert. In practice the evolution of theepidemic would be identical if we just dropped them from the start and readjust therate not to include them.It has been argued that for COVID-19 the distribution of R across the popu-lation is well described by the negative binomial distribution, NB[0.16,0.0437] [21], See, e.g., [19, 20]. t i and t r while R is drawn from this negative binomial. The average of those outbreaks as wellas the result of uSEIR using the average (cid:104) R (cid:105) are also shown. Clearly the varianceis huge, and the average is not a good representation of the individual epidemichistories. The uSEIR curve misses completely the outliers. � �� ( � ) / � Figure 7: Curve of the fraction of infected individuals as a function time (in days)from the average of 100 agent simulations with R distributed in the populationaccording to negative binomial (cyan) with t i and t r ﬁxed. The average of thosehistories is the red curve. The simulation has parameters N = 2 × and I (0) = 10with (cid:104) R (cid:105) = 3 . t i = 5 . t r = 6 . �� ( � ) / � Figure 8: The same as in Fig. 7 with the individual ABM simulations time-shiftedto keep their maxima invariant and coinciding with maximum of the uSEIR curve.average of the basic parameters and not on the initial conditions, as we now showfrom the uSEIR equations.

We have observed that the main eﬀect of the diﬀerent initial conditions is a tempo-ral shift of the maximum, but the shape or the height of the infection curve doesnot change signiﬁcantly. This strongly suggest that the equations have a universalsolution. We have indeed found it. Let us consider the diﬀerential eqs. (3.3) nearthe maximum of the infection curve t max , which will remain as a free parameter.Let us also assume that t max (cid:29) t i , t r , and deﬁne the function F ( t ) ≡ S ( t ) I ( t ) . (5.4)The diﬀerential equations for the uSEIR with ﬁxed t i and t r and for t (cid:29) t i , t r : dSdt + dRdt = rF ( t − t i − t r ) − rF ( t ) (cid:39) − r ( t i + t r ) (cid:18) F (cid:48) ( t ) − t i + t r F (cid:48)(cid:48) ( t ) (cid:19) ,dEdt = r ( F ( t ) − F ( t − t i )) (cid:39) rt i (cid:18) F (cid:48) ( t ) − t i F (cid:48)(cid:48) ( t ) (cid:19) ,dIdt = r ( F ( t − t i ) − F ( t − t i − t r )) (cid:39) rt r (cid:18) F (cid:48) ( t ) − (cid:18) t i + t r (cid:19) F (cid:48)(cid:48) ( t ) (cid:19) . (5.5)18hich implies S ( t ) + R ( t ) = C − r ( t i + t r ) (cid:18) F ( t ) − t i + t r F (cid:48) ( t ) (cid:19) ,E ( t ) = C (cid:48) + rt i (cid:18) F ( t ) − t i F (cid:48) ( t ) (cid:19) ,I ( t ) = C (cid:48)(cid:48) + rt r (cid:18) F ( t ) − (cid:18) t i + t r (cid:19) F (cid:48) ( t ) (cid:19) . (5.6)Since I ( t ) → , E ( t ) → , F ( t ) = S ( t ) I ( t ) → t → ∞ , it follows that C (cid:48) =0 , C (cid:48)(cid:48) = 0 and C = N . Using the previous equations, it is easy to derive a diﬀerentialequation for F ( t ), expanding at linear order in t i and t r : F (cid:48)(cid:48) ( t ) − F (cid:48) ( t ) F ( t ) + r t r t i + t r F ( t ) = 0 . (5.7)We are interested in the solution near the maximum, so we use the initial conditions: F (cid:48) ( t max ) = 0 , F ( t max ) = F . (5.8)This non-linear equation has an analytical solution given by: F ( t ) = F (1 − tanh [ a ( t − t max )]) , (5.9)with a ≡ r (cid:114) t r F t i + t r . (5.10)This is the universal function that drives the evolution of the infected, exposedand susceptible+recovered individuals near the maximum. The maximum of theinfected is at t max − t i for the infected, while the maximum(minimum) for the exposed(susceptible+recovered) is at t max . The integral of this function from [ −∞ , ∞ ] givesa negligible contribution. is (cid:90) ∞−∞ dtF ( t ) = 2 F a . (5.11)We can also derive the value of the susceptible at t max since S ( t max ) = F ( t max ) I ( t max ) = 1 rt r , (5.12)and the curve of the susceptible can be easily obtained S ( t ) = S ( t max ) − r (cid:90) tt max F ( t ) . (5.13) Note that for large t max , the range t < S ( ∞ ) = 1 rt r − r F a . (5.14)With this we conclude that the epidemic curve is universal once the value of themaximum position is determined. The value of F should also depend on the basicparameters and not the initial conditions, although the precise value is not easyto get. A rough estimate can be obtained as follows. Near the maximum, and ifthe incubation and removal times are suﬃciently small, we can approximate that R ( t max ) (cid:39) I ( t max ) + E ( t max ), since the infected and exposed quickly recover; usingthis and the value of S ( t max ) we can estimate F to be F ∼ N − S ( t max )2 r ( t r + t i ) . (5.15)The only dependence on the initial condition remains in t max . In Fig. 9 we comparethe numerical solution to the uSEIR equations to the analytic expression of eq. (5.9),ﬁxing the parameters F and t max (the height and the position of the peak) from thenumerical solution. Varying the initial conditions, that is the fraction of the numberof infected individuals at t = 0, shifts t max , but otherwise leaves the curve invariant.As can be seen, the analytical solution around the peak describes very well the fulluSEIR solution. The agreement is better for smaller values of t i and t r .It is possible to extend this asymptotic solution to the case where t i and t r arenot ﬁxed but drown from distributions P E ( t i ) and P I ( t r ). Eq. (5.7) gets modiﬁedin that the coeﬃcient of the last term becomes: r (cid:104) t r (cid:105)(cid:104) t i (cid:105) + (cid:104) t r (cid:105) (cid:104) t r (cid:105) , (5.16)where (cid:104)(cid:105) refers to the average with the corresponding PDF. Therefore the logistic,eq. (5.9), is still the asymptotic solution with a modiﬁed parameter: a → r (cid:115) (cid:104) t r (cid:105) F (cid:104) t i (cid:105) + (cid:104) t r (cid:105) / (cid:104) t r (cid:105) . (5.17) We now consider the non-homogeneities in the social contacts. We have generateda number of KE networks with (cid:104) k (cid:105) = 40 and diﬀerent clustering properties, bychanging the µ parameter. The networks have 10 nodes. On these networks weevolve the epidemic using the time progression explained in section 6, starting with10 infected nodes. The probability of infection per contact is p = 2 × − . For20 SEIR Tanh � �� ( � ) / � Figure 9: Comparison of the results of the curve of infected as a function of time(in days) for ﬁxed parameters R = 2 . t i = 3 days and t r = 3 days, and theanalytical result of eq. (5.9) with the parameters F and t max tuned with the heightand position of the peak. The two pairs of solid curves correspond to a fraction ofinfected individuals of 10 − and 5 · − . The two dashed lines are the same functionshifted in time.the incubation and removal times we assume a LogNormal distributions, in unitsof the step time, (cid:15) , with parameters ( µ X , σ X ) = (10 , t r and ( µ X , σ X ) =(500 , t i . For these parameters, p (cid:104) t r (cid:105) = 2, which gives an approximation to R up to 1 / (cid:104) k (cid:105) corrections, as explained in sec. 2. From Fig. 1, we can get a more preciseestimate of R =. For each network we run a number of simulations and averagethe S-E-I-R fractions, after performing a time-shift to make their maxima coincide(which as in previous cases, reduces most of the variance). In Fig. 10 we show theevolution of infected individuals as a function of time for the various networks. Weobserve a clear dependence on the clustering parameter, but nevertheless the data inall cases is extremely well described by the universal behaviour derived from uSEIR,eq. (5.9). The lines are three-parameter ﬁts ( a, I , t max ) of the form: I ( t ) = I (cid:2) − tanh ( a ( t − t max )) (cid:3) . (6.1) Note that we use a parametrization where µ X is the mean and σ X is the variance. ◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼◼ ERKE μ = μ = μ = μ = μ = μ = μ = � �� < � � > � ( � ) � Figure 10: Average fraction of infected nodes as a function of time in units of (cid:104) t r (cid:105) for various networks with equal average degree, (cid:104) k (cid:105) = 40, but diﬀerent clusteringproperties, depending on µ . The result for the ER network with zero clustering.The lines going through the data are ﬁts to eq. (5.9), leaving a , I and t max as freeparameters. 22SEIR predicts, according to eqs. (5.9), (5.17) and Fig. 1, a (cid:104) t r (cid:105) (cid:112) I /N = (cid:115) R (cid:104) t r (cid:105) (2 (cid:104) t i (cid:105) + (cid:104) t r (cid:105) / (cid:104) t r (cid:105) ) (cid:39) . , (6.2)while I = p (cid:104) t r (cid:105) F N , (6.3)and using the rough estimate of eq. (5.15), we get I /N ∼ /

6. Both parameterswould therefore be given in terms of the average microscopic parameters.In Figs. 11 we show the dependence of a (cid:112) I /N − and I /N , on the averagelocal clustering (cid:104) C (cid:105) . For small clustering we observe that a (cid:112) I /N − is roughlyconstant and matches rather well the microdynamical average value of eq. (5.10).Instead I /N decreases with clustering, even for small clustering. This eﬀect can beinterpreted as eﬀective suppression of the fraction of susceptible population: clus-tering seems to screen the access to the susceptible. Note that if we substitute in theuSEIR equations S by f c S , where f c is the screening factor, the asymptotic solutionis as in eq. (6.1) with I → f c I , while a (cid:112) I /N − remains invariant. This couldexplain the behaviour found at small clustering.At large clustering, on the other hand, the parameters I and a show a non-trivial dependence with clustering. In spite of this, the logistic remains an extremelygood description of the time evolution of the infected fraction. It would be interestingto understand this behaviour in terms of a renormalization or screening of the basicparameters, or modiﬁcations of the force of infection with respect to the well-mixedapproximation yielding − S (cid:48) /S ∝ I .As a ﬁnal comment, we note that everything we have studied here assumesno time variation of the basic parameters. In a real epidemic, measures of socialdistancing, self-protection, etc. are taken, that induce a sudden change of the basicparameters, particularly the rate of infection. This eﬀect induces a quench of theepidemic curves that we have been discussing in this paper. It will be interestingto explore to what extent the evolution after the quenches can be understood interms of the fundamental parameters, in particular whether the universality nearthe herd-peak translates into some universality of the curve after a quench, if it hashappened in the asymptotic regime. In this paper we have presented a simple formulation, uSEIR, eqs. (4.5) of theSEIR modelization of a epidemic outbreak that properly accounts for an arbitrarydistribution of incubation and removal times, reducing to classical SEIR in the23 �� < � > � < � � > � � / � �� < � > � � � Figure 11: Dependence of the ﬁt parameters a (cid:104) t r (cid:105) (cid:112) I /N − and I /N on the averageclustering. Dashed lines are intended to guide the eye through the data points.24imit of distributions of the exponential family. We have compared this model witha series of ABM homogeneous simulations for various scenarios including uniformparameters, as well as various realistic distributions for the incubation and removaltimes, or for the probability of infection per contact. We have also considered ABMsimulations on scale-free networks with varying clustering properties. In all cases, themodel reproduced the simulations accurately after a non-universal time-shift. Onlyin the presence of large local clustering in the distribution of contacts we observed aclear deviation, when the averages of microdynamical parameters are included. TheuSEIR formulation allowed us to understand the universality property observed indiﬀerent outbreaks in the simulations. This derives from an explicit asymptoticsolution found for small incubation and removal times in terms of a logistic curve,with a shape that can be determined in terms of the microdynamical parameters.This curve is found to ﬁt very well the data even in cases of large clustering, providedthe parameters are left as free ﬁt parameters, suggesting that the dynamics in thehigh clustering regime may still be well described in terms of global variables butwith screened or renormalized parameters. On the contrary, the early stages of anoutbreak are highly non universal, an aspect that should be carefully taken intoaccount when ﬁtting data and predicting using any SEIR modelling. Only whenthe early stages of exponential growth are well underway, is uSEIR expected to bea good description. The averaging of independent outbreaks, without taking intoaccount the non-universal time-shift, can also be very misleading. Appendix A: A numerical algorithm for the uSEIR model

The uSEIR equations are delayed and the most eﬃcient way to solve them numer-ically is to enlarge the number of functions at each time. We deﬁne t max i = n max i (cid:15) and t max r = n max r (cid:15) as the incubation and removal times, such that for t i,r > t max i,r thedistribution probabilities are negligible. These integers ﬁx the number of additionalvariables we need to evolve at each time step: E , ..., E n max i and I , ..., I n max r . Thevariables E k ( t ) and I k ( t ) measure the number of exposed of infected individuals attime t that will change compartment ( E → I or I → R ) at time t + k .The recursive relations for all these variables read: • E compartments: E k ( t + (cid:15) ) =  E k +1 ( t ) + rN S ( t ) I ( t ) P E ( k(cid:15) ) k = 1 , . . . , n max i − rN S ( t ) I ( t ) P E ( t max i ) k = n max i (7.1)where P E ( t ) is the probability for t to be between t and t + (cid:15) . • I compartments: 25 k ( t + (cid:15) ) =  I k +1 ( t ) + E ( t ) P I ( k(cid:15) ) k = 1 , . . . , n max r − E ( t ) P I ( t max r ) k = n max r (7.2)While the SEIR variables: E ( t ) = n max i (cid:88) k =1 E k ( t ) , I ( t ) = n max r (cid:88) k =1 I k ( t ) . (7.3)and S ( t + (cid:15) ) = S ( t ) − rN S ( t ) I ( t ) ,R ( t + (cid:15) ) = R ( t ) + I ( t ) . (7.4)Implementations of this algorithm in the Julia programming language and inPython/Cython can be found, respectively, in: https://gitlab.ift.uam-csic.es/alberto/useirhttps://github.com/jjgomezcadenas/useirn Acknowledgements

We thank J. Hernando for useful discussions.

References [1] B. Tang, Q. Wang, X. amd Li, S. Bragazzi, N.L. amd Tang, Y. Xiao, and J. Wu.Estimation of the transmission risk of the 2019-nCoV and its implication forpublic health interventions.

J. Clin. Med., 9, 462. , 2020.[2] Q. Lin et al. A conceptual model for the coronavirus disease 2019 (COVID-19)outbreak in Wuhan, China with individual reaction and governmental action.

International Journal of Infectious Diseases 93 211216 , 2020.[3] L. L´opez and X. Rod´o. A modiﬁed seir model to predict the COVID-19 outbreakin Spain and Italy: Simulating control scenarios and multi-scale epidemics. https://doi.org/10.1101/2020.03.27.20045005. , 2020.[4] H.W. Hethcote. The mathematics of infectious diseases.

SIAM Review, Vol.42, No. 4., pp. 599-653 , 2000.[5] J. Zhang et al. Evolving epidemiology and transmission dynamics of coronavirusdisease 2019 outside Hubei province, China: a descriptive and modelling study.

Lancet Infect Dis 2020 , 2020. 266] J. Hellewell et al. Feasibility of controlling COVID-19 outbreaks by isolation ofcases and contacts.

Lancet Glob Health 2020 , 2020.[7] T. Britton and E. (Eds.) Pardoux.

Stochastic Epidemic Models with Inference ,volume 2255 of

Mathematical Biosciences Subseries . Springer, 2019.[8] E. Hunter, B. Mac Namee, and J. Kelleher. A taxonomy for agent-based modelsin human infectious disease epidemiology.

Journal of Artiﬁcial Societies andSocial Simulation 20(3) 2 , 2017.[9] W.O. Kermack and A.G. McKendrick. Contributions to the mathematical the-ory of epidemics, part 1.

Proc. Roy. Soc. London Ser. A , 115, pp. 700-721. ,1927.[10] Duncan J. Watts and Steven H. Strogatz. Collective dynamics of ’small-world’networks.

Nature , 393(6684):440–442, June 1998.[11] R. Pastor-Satorras, C. Castellano, P. Van Mieghem, and A. Vespignani.Epidemic processes in complex networks.

Reviews of Modern Physics ,87(3):925?979, Aug 2015.[12] K. Klemm and V. Egu´ıluz. Growing scale-free networks with small-world be-havior.

Physical Review E , 65(5), May 2002.[13] R.M. Anderson and R.M. May.

Infectious Diseases of Humans: Dynamics andControl . Dynamics and Control. OUP Oxford, 1992.[14] A.L. Lloyd. Realistic distributions of infectious periods in epidemic models:Changing patterns of persistence and dynamics.

Theoretical Population Biology ,60(1):59–71, August 2001.[15] A.L. Lloyd. Destabilization of epidemic models with the inclusion of realisticdistributions of infectious periods.

Proceedings of the Royal Society of London.Series B: Biological Sciences , 268(1470):985–993, May 2001.[16] H.J. Wearing, P. Rohani, and M.J. Keeling. Appropriate models for the man-agement of infectious diseases.

PLoS Medicine , 2(7):e174, July 2005.[17] A.J.K. Conlan, P. Rohani, A.L. Lloyd, M. Keeling, and B.T. Grenfell. Resolvingthe impact of waiting time distributions on the persistence of measles.

Journalof The Royal Society Interface , 7(45):623–640, September 2009.[18] O. Krylova and D.J.D. Earn. Eﬀects of the infectious period distribution onpredicted transitions in childhood disease dynamics.

Journal of The RoyalSociety Interface , 10(84):20130098, July 2013.2719] M.J. Keeling and K.T.D. Eames. Networks and epidemic models.

Journal ofThe Royal Society Interface , 2(4):295–307, June 2005.[20] S. Bansal, B.T. Grenfell, and L.A. Meyers. When individual behaviour mat-ters: homogeneous and network models in epidemiology.

Journal of The RoyalSociety Interface , 4(16):879–891, July 2007.[21] J.O. Lloyd-Smith, S.J. Schreiber, P.E. Kopp, and W.M. Getz. Superspreadingand the eﬀect of individual variation on disease emergence.