The effect of time distribution shape on simulated epidemic models
aa r X i v : . [ q - b i o . Q M ] D ec The e(cid:27)e t of time distribution shape onsimulated epidemi modelsMartin Camitz Åke Svensson30th November 2007Abstra tBy onvention, and even more often, as an unintentional onsequen eof design, time distributions of laten y and infe tious durations in sto has-ti epidemi simulations are often exponential. The skewed distribtiontypi ally leads to unrealisti ally short times. We examine the e(cid:27)e ts ofaltering the distribution laten y and infe tious times by omparing thekey results after simulation with exponential and gamma distributions ina homogeneous mixing model aswell as a model with regional divisions onne ted by a travel intensity matrix. We show a delay in spread withmore realisti laten y times and o(cid:27)er an explanation of the e(cid:27)e t.Key words: Prevention & Control; Sto hasti Pro ess; Epidemiol-ogy; Infe tious time; Laten y time; 1 Introdu tionExponential distributions for laten y times and infe tiousness times oftenappear with models of infe tious diseases, simulated or solved analyti- ally. The distribution does not resemble observed distribution of laten yor infe tious times. Depending on the problem at hand, this may be areasonable simpli(cid:28) ation. For ertain questions where the speed of spreadof the infe tion is of less importan e, this assumption may give perfe tlysatisfa tory results. Re ently resear h interest, however, has been dire tedin the initial highly random phase of the epidemi , whereas the (cid:28)nal sizeof the epidemi is perhaps of less interest [1, 2℄. In spite of this the ex-ponential time assumption has be ome o(cid:27)-the-wall and many authors, bytradition, disregard the onsequen e of their assumption.The reason for the wide spread use is that the exponential distributionis inherently "memoryless" [3℄ whi h means that future predi tions of thestate of the epidemi in terms of number of latent and infe tious individu-als et is based solely on the urrent state and not on the history of states.The probability that 10 people will have fallen ill on Friday depends onlyon how many are ill on Thursday. The state on Wednesday or Tuesday isirrelevant. This makes possible a simple sto hasti simulation by utilizinga Markov pro ess.Exponential distributions will appear as a onsequen e of the assump-tion that the rate at whi h individuals leave a ertain state at a ertaintime only depdends on how many individuals is in that state at this time.This orresponds to a onstant hazard for any individual to leave the stateis the same as the "memoryless" property. Many authors therefor in ludethe exponential distribution assumption more of less unintentionally whiledesign the model.In this paper we show that the time distribution is vital for a urateresults and also that, without abandoning a Markov approa h, we aregiven some freedom to adapt the distribution to (cid:28)t real data, using agamma distribution.Mu h work has been done to show the e(cid:27)e t of traveling and migration2n the evolution of epidemi s [4, 5, 6, 7, 8℄. For today's global outbreaks,notably the SARS outbreak of 2001, the need to in orporate what informa-tion we have on travel networks in our simulations has be ome in reasinglyapparent. Models that take the Markov approa h seem well suited for thispurpose whi h was demonstrated by Hufnagel et al. The population is di-vided into a number of lo al regions whi h an be ountries, muni ipalitiesor other geographi or even so ial groupings. They are inter onne ted byan infe tiousness intensity matrix des ribing how infe tion is tranferredbetween regions. This matrix an be estimated from, for example, traveldata.Hufnagel et al. used the at hment area around ea h internationalairport and within these used a SEIR-model, where every individual an bein one of the states sus eptible, latent, infe tious and re overed. These lo alpro esses were linked together by the network of international avaiationenabling the disease to be transmitted along (cid:29)ight paths.Camitz and Liljeros [9℄ onstru ted a similar model of a SARS-likeoutbreak in Sweden. In this model the muni ipal borders were used topartition the ountry. Using detailed travel data between muni ipalities,a travel intensity matrix was estimated and the geograpi spread ould bestudied aswell as the e(cid:27)e t of travel restri tions.In more detail, the SLIR-model works as follows. The population inea h muni ipality is assigned to one of four states, de ribing their dis-ease state: Sus eptible, Latent, Infe tious and Re overed. A sus eptiblemay be ome latent with a probability whi h depends on the number ofinfe tious, in his/her own aswell as onne ted muni ipalities, dependingon the intensity of travel between onne ted muni ipalities. After beinginfe ted, the latent individual moves through stages L and R in times or-responding to known laten y and infe tious times. The a tual time for anindividual will vary randomly about the mean time, whi h is (cid:28)xed. The ru ial point is how these times vary. In [5, 9℄ the times are pi ked froman exponential distribution. 3 → L → I → R Indeed we are ertainly not the (cid:28)rst to introdu e the Gamma distri-bution in these ontexts. Gamma distributions have for a long time been stadard in modeling progress of hroni diseases (e.g. an er) throughdi(cid:27)erent stages. They have also been used in models of epidemi s, see [10℄for a re ent example. But dis ussion about using this and other distribu-tions is la king in resear h today. Times with single point distributionsare sometimes onsidered a reasonable approximation [11, 12℄ but for fol-lowing the omplete dynami s we feel that a varian e is ne essary. Othertime distributions have also been used, su h as uniform, Log-normal orWeibull, the latter two notably di(cid:27)ering from Gamma primarily in theirtails. Su h distriutions may be appropriate but wull not be possible tomodel with the Markov approa h.1.1 The exponential and gamma distributionThe main drawba k with exponential times is a questionable tie to reality.The exponential distribution is highly skewed, with high densities for shorttimes and a long tail. Empiri al laten y times and infe tious times arenot exponentially distributed, but rather have a symmetri density abouttheir expe tation values. Furthurmore, the exponential distribution hasa quite high varian e, equal to its expe tation value squared, whereasempiri al times tend to deviate little from the mean. The dark blue urvein 1 shows a plot of the probability density fun tion of the exponentialdistribution as a spe ial ase of the gamma distribution, the ir umstan esfor this relationship to be explained later. The exponential distributionhas a single parameter equal to the inverse of the expe tation value.In pra ti al simulations this will shorten the time of interest. Sin e themedian is lower than the expe tation value implying that most times willbe shorter than the expe ted. For example, say the expe tation value ofthe laten y time is set to 5 days. With an exponential distribution, 63%of the random times will be shorter than 5 days. 18% will be shorter than4 day. Su h short times are learly unrealisti and furhtermore, laten ytimes are expe ted to fall symmetri ally about the mean.The additional disadvantage in sto hasti epidemi simulations is thatthe out ome is highly dependent on the initial stages. Individuals withshort laten y times will predominantly make up the initially infe ted andwill inevitably speed up the outbreak. That is to say that the skewness ofthe exponential distribution is dominant in the early stages of the simula-tion whereas the expe tation value is not apparent until the sto hasti ityhas averaged out.A few authors have proposed that the gamma distribution be usedinstead[ref℄. The gamma distribution, denoted Γ( κ, θ ) has two parame-ters, a shape parameter κ and a s ale parameter θ . For integer κ :s theprobability density fun tion takes on a parti ularly simple form: f ( t ; κ, θ ) = t κ − e − t/θ θ κ ( κ − (1)The mean is κθ and varian e κθ . For κ = 1 the gamma distribution isin fa t identi al to the exponential distribution. Keeping the expe tationvalue onstant, with larger κ s, the gamma distribution be omes in reas-ingly symmetri about its mean and start to resemble times distributionswe have learnt to expe t. The skewness of the density fun tion is infa t / √ κ .The gamma distribution an a tually be realized with an un ompli- ated extension of a Markov model su h as the one in [9℄. The sum ofseveral exponentially distributed times is in fa t gamma-distributed. Thisis expressed as follows. Let X . . . X n be independant sto hasti variablesfrom an exponential distribution Exp( ξ ) . Then, Y = P ni =1 X i belongs to Γ( n, ξ ) .In pra ti e, instead of having only a single laten y stage and a singleinfe tious stage, we add stages, for ing ea h individual to go through sev-eral stages of laten y before be oming infe tious, and in the same manner,several stages of infe tiousness before re overing. Thereby we a hieve anarbitrarily symmeti time distribution with a minimal alteration to our5LIR-model. In doing so we alter κ whi h is as shown is equivalent tothe number of stages. These stages have no epidemiologi al meaning butserve only to hange the appearan e of the time-distribution.At the same time we have to de rease ξ so as to keep onstant theexpe ted time whi h is nξ in the gamma distribution. We an sele t any κ we like to produ e a good enough (cid:28)t to an empiri al distribution, orat the very least, the mean and varian e. The ost of added stages isof ourse memory requirements but happily the simulation time is notin(cid:29)uen ed to a degree to be a deterent in any way. This is due to oneof the key advantages of the Markov approa h. We do not have to keeptra k of any individuals in the model. We simply re ord their number ineash state.Using a modi(cid:28)ed version of [9℄ we show that ignoring the shape of thetime distribution devalues the results, omparing the results for di(cid:27)erent κ for both laten y times and infe tious times. The di(cid:27)eren e in absoluteterms is signi(cid:28) ant.2 Data and MethodsWe arried out two sets of four simulations, ea h onsisting of 1000 realiza-tions of an outbreak inititated with one infe ted individual in Sto kholm.In the (cid:28)rst set we on(cid:28)ned the population of Sto kholm allowing us to testthe hange employing the gamma distribution in a single lo ality random-mixing situation not ompli ated by travel. There is no spe i(cid:28) reasonfor using Sto hkolm either as the origin of infe tion or as a om(cid:28)nement.The mixing model is the same in all muni ipalities.In the se ond set, we used the full travel network for a full s ale sim-ulation. In ea h set we ran a referen e simulation with both the laten yand infe tious times distributed a ording to an exponential distributionwith the mean 5 days. Ex ept for di(cid:27)erent parameters, this setup orre-sponds exa tly to the one used in [ amitz℄. The other three had gamma-distributed laten y times, infe tious times or both. In the ase of gammadistributions, κ = 3 was used. ξ was adjusted to attain an expe ted time6f, again, 5 days.The inter-muni ipal infe tioussness matrix is the same as in Camitz& Liljeros [9℄. It is based on an interview survey ondu ted in Swedenbetween 1999 and 2001 ontaining some 35000 journeys. This resulted inapproximately 12000 matrix elements γ ij ea h estimated with γ ij = γM ij / X j M ij (2)where M ij is the number of journeys per day from muni ipality i to j and γ is a global s alar [5℄.The disease is a (cid:28) tive moderatly infe tious disease with an R of . ,within every homogenous subpopulation.To des ribe the state of the epidemi we introdu e the ve tor S to keeptra k of the number of sus eptibles in ea h muni ipality. Additionally, twosets of ve tors L . . . L κ and I . . . I λ are de(cid:28)ned to keep tra k of latentsand infe tious. The indexes κ and λ are the hosen (cid:28)rst paramenters forthe gamma distribution for laten y and infe tious times respe tively. Wewill use a general formalism for the time being but later we set the param-eters to either 1 or 3. In the (cid:28)rst ase, orresponding to an exponentialdistribution, there will only be one ve tor in the set. If κ is greater thanunity, then this will be the number of stages of laten y or infe tioussnessthat ea h individual needs to pass through. The sizes of ea h ve tor is of ourse equal to the number of muni ipalities. Let P be this number. Thedimensionality of the entire state spa e is equal to D = P · (1 + κ + λ ) .The ve tors are indexed as I k,i (italisized when indexed with i ) where i is the muni ipality and k is the stage of disease. For any purposes they an be treated as tensors or matri es. Summing over all k s and i s yieldsin this ase the total number of infe ted. Note that re ording the numberof re overed individuals is redundant sin e it is simply the sum of thenumber in the three states of infe tiousness already overed, subtra tedfrom the population.At the start of the run the element S i of S is equal to the populationsizes N i of ea h muni ipality. This is the initial state in ea h run. Forea h muni ipality we now have κ + λ possible state transitions, ea h7nvolving in rementing an element orresponding to the muni ipality inone ve tor and de rementing the "pre eding". This is true for all tran-sitions ex ept from the last stage of infe tiousness whi h of ourse onlyinvolves a de rement.We are now ready to set up the equations that will de(cid:28)ne the transitionmatrix of our Markov pro ess. The quantities Q X ik below, is for ea hmuni ipality i the intensity of individuals passing on to the next stage ofillness and are onne ted to the probabilities of the orresponding statetransitions. X ∈ {L . . . L κ , I . . . I λ , R} is a label signifying transitionsto one of the laten y states, one of the infe tious states or the re overedstate. It is written in a alligraphi font to avoid onfusion with L k , I k and R whi h are ve tors. Q L i = υκL ,i ... Q L κ i = υκL κ − ,i Q I i = υκL κ,i Q I i = βλI ,i (3)... Q I λ i = βλI λ − ,i Q R = βλI λ,i Finally, people are infe ted (be ome latent) with the intensity that de-pends number of infe ted in all the muni ipalities and the travelintensitybetween ea h of them: Q L i = α κ X k =1 I ki + N X j =1 j = i γ ij λ X k =1 I kj S i N i . (4)In the equations above, α is the the expe ted number of se ondaryinfe ted per infe tious. υ is the inverse laten y period and β the re overyrate. The se ond row reads: The number of people per unit time leavingthe (cid:28)rst laten y stage is the number of people in that stage times the8umber of stages times the s alar rate υ . The last row is similar, as isthe (cid:28)rst term of the (cid:28)rst row but summed over all infe tious stages andalso in ludes a fa tor to a ount for a de reasing number of sus eptibles.The se ond term is the ontribution from other muni ipalities via theinfe tiousness network. It in ludes a sum of infe tious individuals over allstages and all muni ipalities but the urrent.Ea h of these intensitities is the parameter required to spe ify theexponential distribution that yields the timesteps for the orrespondingtransition. The model is now in all respe ts in pla e. To simulate wewould like to take ea h transition in order and so we are interested toknow the time ∆ t untill the next transition, given the urrent state. Thetime, one an easily show, is in identally also exponentially distributedwith parameter Q equal to the sum of the D intensities in Eqs. 3 and 4, ∆ t ∈ Exp( Q ) , (5) Q = M X i =1 ( Q L i + · · · + Q L κ i + Q I k i + · · · + Q I λ i + Q R i ) . (6)To determine whi h transition o urs at this time we ompare the in-tensities among themselves. The probability of a transition is proportionalto the relative value of the orresponding intensity, simply the intensitynormalized by Q . So in ea h pass through the main loop of the algorithmwe (cid:28)nd Q , pi k a random time step from the exponential distributionspe i(cid:28)ed by Q as a parameter, randomly pi k a transition a ording tothe relative value of the intensities, update the state ve tors and the inten-sities a ording to the new state and start again. The simulation pro eedsthis way until an arbitrarily hosen time limit is rea hed or until there areno more infe tious or latent, whi h ever omes (cid:28)rst. In our ase we hose60 days as by this time a substantial majority of simulated s enarios willhave developed into epidemi s. Re all that the obje t of interest in notthe (cid:28)nal size but any delay in time of the epidemi s.9 ResultsThe prevalen e, along with some additional results, from the (cid:28)rst set offour simulations on(cid:28)ned to Sto kholm is presented in table I. It learthat the shape of the time-distribution determines the out ome of thesimulated epidemi . What is more, the prevalen e after 60 days followsthe anti ipated pattern, de reasing with more realisti laten y times andin reasing with more realisti infe tious times.The results of the se ond simulation set is presented in (cid:28)gure 3 asa geographi plot over Sweden with ea h muni ipality represented by a olored dot. The prevalen e is represented by olor on a logarithmi s ale.Again, the in iden e and geographi spread is highly dependent on theshape of the time-distribution, see also II. As with the (cid:28)rst set, theorder of severeties after 60 days is the anti ipated but th e(cid:27)e ts are evenmore apparent. Retransmission from onne ted muni ipalities ampli(cid:28)esthe distribution e(cid:27)e ts.Remember that there is a time limit of 60 days and that di(cid:27)erentlaten y time distributions do not ne essarily a(cid:27)e t the height of the in- iden e peak, only when it o urs. We also added a (cid:28)gure for additionalsupport with k simultaneously varied from to .4 Dis ussionConsidering (cid:28)rst the simpler ase of a single muni ipality, the extremelyshort laten y times generated by the exponential distribution was ex-pe ted to a elerate the epidemi . More individuals be ome infe tiousearly in the simulation, in turn infe ting others earlier. It an be shown,however, that with shorter mean laten y time the (cid:28)nal size of the epi-demi is un hanged. With a skewed infe tious time distribution the e(cid:27)e tis reversed. The epidemi will be delayed, at least initially, due to theabundan e of very short infe tious times. Ea h infe ted will infe t a fewernumber of se ondary infe teds before re overing. It is harder for the epi-demi to at h on and the probability of the disease dying out ompletely10 Figure 1: Probability distribution of the Gamma distribution for varying k , all withexpe tation value . The spe ial ase of k = 1 produ es an exponential distribution. peop l e i n t unne l People enter tunnel at a constant rate, 1000 per dayt ∈ Exp(5)t ∈ Γ (3,5/3) Figure 2: A simulation of people entering a ave at a rate 1000 per day at speedssele ted from, in the ase of the blue urve, an exponential distribution and in the ase of the red urve, a gamma distribution with k = 3 . The number of peoplesimultaneously in the ave is plotted. The expe ted passage time for both urve is5 days whi h gives the same number of people in the ave after a transitional phase.The transitional phase di(cid:27)ers, however. 11 =1 λ =1 κ =3 λ =3
1 10 100 100010000
Figure 3: Image visualizing the epidemi state after 60 days simulation, averagedover 1000 runs. The form parameter for laten y times in rease from the left to right olumn and for infe tious times from the bottom to top row. The prevelen e in ea hmuni pality is olor oded on a logarithmi s ale. Clearly a more realisti laten y timedistribution delays the epidemi signi(cid:28) antly.12 λ λ to randomvarian e and as an e(cid:27)e t of the ut-o(cid:27) time, as they should theoreti ally be equal.13 λλ A cc u m u l a t ed P r e v a l en c e Form parameter ( κ = λ ) 204060 A ff li c t ed m un i c i pa li t i e s
1 10 1001000
Figure 4: Here the form parameter for both latant and time distributions are setequal. Cumulative in iden e i.e. the total number of infe ted, is plotted below forea h setting. day peop l e i n t unne l People enter tunnel at exponentially increasing rate, e .t ∈ Exp(5)t ∈ Γ (3,5/3)(3,5/3)