A Relationship Between SIR Model and Generalized Logistic Distribution with Applications to SARS and COVID-19
aa r X i v : . [ s t a t . A P ] O c t A Relationship Between SIR Modeland Generalized Logistic Distributionwith Applications to SARS and COVID-19
Hideo Hirose
Bioinformatics Center, Kurume UniversityKurume, Fukuoka, Japan
Abstract
This paper shows that the generalized logistic distribution model is derived fromthe well-known compartment model, consisting of susceptible, infected and re-covered compartments, abbreviated as the SIR model, under certain conditions.In the SIR model, there are uncertainties in predicting the final values for thenumber of infected population and the infectious parameter. However, by utiliz-ing the information obtained from the generalized logistic distribution model, wecan perform the SIR numerical computation more stably and more accurately.Applications to severe acute respiratory syndrome (SARS) and Coronavirus dis-ease 2019 (COVID-19) using this combined method are also introduced.
Keywords:
SIR model, generalized logistic distribution, parameter estimation,best-backward solution, L-plot, combination method, SARS, COVID-19
1. Introduction
The well-known compartment model, consisting of susceptible, infected andrecovered compartments, abbreviated as the SIR model, has been commonlyused in infectious disease spread simulations for more than half a century al-though the mathematical model is very simple (see Kermack and McKendrick,1933; Anderson and May, 1991; Diekmann and Heesterbeek, 2000, e.g., for gen-eral descriptions). Such a long life-length proves the effectiveness of this model,resulting in a variety of expansions and many actual applications (see Berge etal., 2017; Chowell et al., 2006; Ferguson et al., 2001; Vaidya et al., 2014, e.g.,for specific applications). Similar to other cases, this model has been recentlyapplied to Coronavirus disease 2019 (COVID-19) caused by the novel severeacute respiratory syndrome coronavirus 2 (SARS-CoV-2); see Wu et al., 2020;Tuite et al., 2020; Dehning et. al., 2020; Fang et al., 2020; Fokas et al., 2020;Giordano et al., 2020; Anand et al., 2020; Bertozzi et al., 2020.
Email address: [email protected] (Hideo Hirose) eanwhile, the generalized logistic distribution has also been used in statis-tical infectious disease spread predictions (see e.g., Zhou and Yan, 2003; Hirose,2007) as an empirical model. The method is also used in COVID-19 case (seeAviv-Sharon and Aharoni, 2020, e.g.).However, there seems to be no plain explanations connecting this differentialequation model and the probability distribution model. In this paper, firstly, weshow that the generalized logistic distribution model can be derived from theSIR model under certain conditions. Then, we can propose a more accurate andstable prediction method by combining these two models. Applications to se-vere acute respiratory syndrome (SARS) and COVID-19 using this combinationmethod are also introduced.
2. SIR Model and Its Uncertainties in Numerical Computations
The SIR model uses the ordinary differential equations dS ( t ) dt = − λS ( t ) I ( t ) , (1) dI ( t ) dt = λS ( t ) I ( t ) − γI ( t ) , (2) dR ( t ) dt = γI ( t ) , (3)where S , I , and R mean the susceptible, infectious, and removed populations,and the parameters λ , and γ are the infection rate, and removal rate (recoveryrate). In the SIR model, for example, a person could change his or her conditionfrom susceptible to infected with a ratio λ , then to removed with a ratio γ .Removed persons will never become susceptible. From Equations (1), (2), (3),we have dS ( t ) dt + dI ( t ) dt + dR ( t ) dt = 0 , (4)which means S ( t ) + I ( t ) + R ( t ) = const. This is a total population size, and wedenote this by N . Here, we introduce T ( t ) = I ( t ) + R ( t ) for further discussions.This is the number of cumulative infected persons. Assuming that I ( t ) is small enough comparing to S ( t ) at early stages, thatis, S ( t ) ≈ S (0) ≈ N , then Equation (2) can be written dI ( t ) dt = λS (0) I ( t ) − γI ( t ) = ( λS (0) − γ ) I ( t ) . (5)This fundamental ordinary differential equation can be easily solved as I ( t ) = I (0) e ( λS (0) − γ ) t , (6)2sing integration, where I (0) and S (0) are the initial number of infected per-sons and the initial number of susceptible persons, respectively. If we define R = λS (0) /γ ≈ λN/γ which is called the basic reproduction number, I ( t )becomes increasing when R >
1, and decreasing when R <
1. That is, R plays an important role in determining the pandemic phenomena or extinctionphenomena at early stages of the infectious disease spread. Therefore, to know R as early as possible is considered to be crucial as well as to find the solutionsfor the SIR model. By rewriting R = λS (0) /γ to R = ( λS (0) I ( t )) / ( γI ( t )), itis worth mentioning that the condition that the number of inflow of infected per-sons is equivalent to the number of outflow of removed persons results in R = 1.Later, this concept is useful to consider the current reproduction number R c ( t )at current time t such that R c ( t ) = λ ( t ) S ( t ) γ ( t ) , (7)where λ ( t ) and γ ( t ) are infection rate and removal rate at time t , respectively. In solving the SIR ordinary differential equations as an initial value problem,we require parameter values of λ and γ , and initial values of S (0), I (0) and R (0). By observation, we can obtain daily populations for I ( t ) and R ( t ) becausedaily numbers of newly infected, died and recovered (cured) persons are noticedin public. We note that I ( t ) is not identical to the daily number of newly infectedpersons. Then, parameters λ and γ for the SIR model at time t can be roughlyobtained by using the simultaneous difference equations below, regarding thedifferential equations as the difference equations. λ ( t ) = S ( t ) − S ( t + 1) S ( t ) I ( t ) , (8) γ ( t ) = R ( t + 1) − R ( t ) I ( t ) . (9)Since daily values of λ ( t ) and γ ( t ) are unstable to some extent, such aninstability shall be removed by adopting the mean values computed from thelatest λ ( t ) and γ ( t ) values; for example, seven days average values can be used.However, this is not an optimal solution. To find much more accurate estimatesfor parameters λ and γ , we may use the best-backward solution (BBS) methodexplained below. First, we obtain the initial guesses of λ (0) and γ (0) for λ and γ , e.g., by usingthe simultaneous difference equations above. Then, we estimate the optimumvalues of ˆ λ and ˆ γ for λ and γ by using the simplex method (Nelder and Mead,3965), which is shown to be an extension from the method previously proposed(Hirose, 2012). In optimization, we evaluate the following function E ( n, s ) ( k ) iteratively, E ( n, s ) ( k ) = n X j = n − s { ( T ( k ) ( t j ) − ˜ T ( t j )) + ( R ( k ) ( t j ) − ˜ R ( t j )) } , (10)= n X j = n − s { ( S ( k ) ( t j ) − ˜ S ( t j )) + ( R ( k ) ( t j ) − ˜ R ( t j )) } , (11)where ˜ T ( t j ), ˜ S ( t j ) and ˜ R ( t j ) are the numbers of observed values for cumulativeinfected persons, susceptible persons and removed persons; T ( k ) ( t j ) and R ( k ) ( t j )are k -th iterative solutions for the numbers of cumulative infected persons, sus-ceptible persons and removed persons of the ordinary differential equations ofthe SIR. We continue this iteration until | E ( n, s ) ( k +1) − E ( n, s ) ( k ) | < ε holds,where ε is a small positive number. In solving the SIR equations, the solutionsare obtained backward from time t = t n to time t = t n − s somehow, e.g., Runge-Kutta method. Finally, we could find the converged values ˆ λ and ˆ γ . We notethat T ( k ) ( t n ) = ˜ T ( t n ), S ( k ) ( t n ) = ˜ S ( t n ) and R ( k ) ( t n ) = ˜ R ( t n ) for all k . Takinginto account the importance of the most recent observed values, we often use s = 7.However, S ( t ) becomes unreliable because it strongly depends on N , andwe cannot determine an appropriate size of N . It may be a small district size,or a large country size, depending on the region of disease spread. That is, N is unknown in general because plausible uninfected persons who can contactinfected persons are not identified. It is an uncertainty factor in using the SIRmodel that we cannot determine N . There may be a resolution to such aninconvenience by removing the term N from Equations (1), (2) and (3). Thatis, we assume S ( t ) + I ( t ) + R ( t ) = 1. However, we often want to know the actualinfected population size, but not the ratio of infected population to the totalpopulation N . We will deal with such the case. To look at this phenomenon, we have performed a simulation study using theSARS case in Hong Kong in 2003 (SARS case data, (2003)). We assume casesthat N is 2,000 (strongly restricted area population), 10,000, 100,000, 1,000,000,and 6,810,000 (actual Honk Kong population in 2003).Figure 1 shows the predicted curves for the number of cumulative infectedpersons, T ( t ), after 30th day using observed data from 22nd to 30th day via theSIR model; here, day 1 was March 17, 2003. In the figure, the observed valuescorresponding to T are the superimposed using dotted points. We see that theprediction curves strongly depend on N . The larger the value of N , the steeperthe increasing tangent. In the cases of N ≥ , T ( t ) is bounded above by the upper limit N . Therefore, we can mention that we cannot predict the robust value for T if N is unknown, typically at early stages.4 igure 1: Various cases of the predicted curves for the number of cumulative infected persons, T ( t ), after 30th day using from 22nd to 30th day observed data via the SIR model (SARSin Honk Kong in 2003). We assumed cases that N is 2,000, 10,000, 100,000, 1,000,000, and6,810,000. There is also another uncertainty factor in solving the SIR ordinary differ-ential equations. We cannot estimate the consistent value for λ ( t ) unless N isclearly predetermined, typically at early stages.
3. The Generalized Logistic Distribution and the SIR
The generalized logistic distribution (GLD) model developed by Richards(see Richards, 1959) can be applied to flexible growth function for empiricaluse. This model is based on a simpler model (see Von Bertalanffy, 1938) todescribe the increase of weight as a function of the metabolism process of an-imals. The GLD is applied also to other fields such as hydrology (see Zakariaet al., 2012, e.g.), medical fields in infectious disease spread modelling such asSARS, foot and mouth disease (FMD), Zika virus disease, Ebola virus disease,and SARS Cov-2 and in growth modelling such as physiochemical phenomenon,psychological issues, survival time of diagnosed leukemia patients, and weightgain data.
The three-prameter generalized logistic distribution function is defined as F ( t ; σ, µ, β ) = 1 { − t − µσ ) } β , (12)where σ , µ and β denote the scale, location, and shape parameters, respec-tively. By introducing z = ( t − µ ) /σ , we have the standard generalized logistic5istribution expressed by F ( z ; β ) = 1 { − z ) } β , (13)where exp( − z ) = exp( − t − µσ ) = exp( µσ ) exp( − tσ ) . (14)In estimating the parameters θ = ( σ, µ, β ) T , we often use the maximumlikelihood estimation method. Since observed data are usually daily data, thelikelihood function L ( θ ) can be constructed by using the grouped truncatedmodel expressed as L ( θ ) = (cid:26) F ( t ; θ ) F ( t n ; θ ) (cid:27) k · n Y i =1 (cid:26) F ( t i ; θ ) − F ( t i − ; θ ) } F ( t n ; θ ) (cid:27) k i , (15)where k i (0 ≤ i ≤ n ) represents the number of infected persons from time −∞ to t or t i − to t i . When the total number of cases is known in advance, we canalso use the trunsored model (Hirose, 2007). In the SIR model (1), (2), (3), we have assumed that I ( t ) is small enoughcomparing to S ( t ) at early stage, i.e., S (0) ≈ N , then we have derived the simpleordinary differential equation (4) with the solution of (5). This solution showsthe explosive increasing population for the infected persons when R >
1. Inthe real world, the number of infected persons is bounded above. Thus, a muchmore realistic model is required at later stages in disease spreading.Since T ( t ) = N − S ( t ), from Equation (1), we have dT ( t ) dt = − dS ( t ) dt = λS ( t ) I ( t ) = λS ( t )( T ( t ) − R ( t )) . (16)Assuming that R ( t ) = 0, i.e., none of infected persons will be transferred to theremoved population, then, this equation becomes dT ( t ) dt = λT ( t )( N − T ( t )) . (17)This equation shows a symmetry between T ( t ) and ( N − T ( t )), and consequently,the solution represents increasing flat S-shaped curve. By integration, we caneasily derive the solution of (17) as T ( t ) = N NT (0) − e − λNt , (18)which is called the logistic function. The inflection point becomes (log( N/T (0) − /λN, N/
2) by solving d T ( t ) /dt = 0.6n the real world, this model is still unrealistic because asymmetric curvesare often observed. Then, we assume that ( T ( t ) /N ) shall be ( T ( t ) /N ) m becauseit would be natural to think that the susceptible persons would be more/less af-fected by infected persons depending on the magnitude of m rather than linearlyaffected. For example, if half of the population is already infected, the ratio ofinfectious persons will not be half but will be inflated to 3 / m = 2, andit will be shrunk to (2 − √ / m = 1 / dT ( t ) dt = bT ( t )(1 − ( T ( t ) N ) m ) , (19)where b is a constant holding that b = λN when m = 1. To solve this equation,firstly, we set y ( t ) = T ( t ) /N , and further, we use change of variables such that z ( t ) = ( y ( t )) − m (refer to Skiadas, 2009, e.g., for such transformations). Then,Equation (21) can be written as dzdt = − bm ( z − , (20)which can also be solved by integration, and the solution is T ( t ) = N { T (0) N ) − m −
1) exp( − bmt ) } /m . (21)This reveals that the curve of T ( t ) shows the same shape to one expressedby Equation (13) except for the scale. Therefore, the generalized logistic dis-tribution is derived from the SIR model with certain assumptions. These as-sumptions could be applied to the cases at early stages of the disease spread,we may use this probability distribution as the statistical model representingthe infectious disease spread phenomena. The time t for the inflection pointbecomes (log((( N/T (0)) m − /m )) /bm by solving d T ( t ) /dt = 0.We have the relationships of parameters between Equation (12) and Equa-tion (21) such that b = β/σ , k = (1 + exp( µ/σ )) − β , m = 1 /β , σ = 1 / ( bm ), µ = (log( k − m − / ( bm ), β = 1 /m . N Using exactly the same example in the previous section, we show, in Figure2, the predicted curves for the number of cumulative infected persons, T ( t ),after 30th, 33rd, 45th and 73th day using observed data from the first day tothe last observed day, by maximizing the likelihood function (22). Dotted pointsshow the observed values for T . In this GLD model, we can estimate the final(i.e., t → ∞ ) value for T ( t ) such thatˆ T ( ∞ ) = T ( t ) F ( t ; ˆ θ ) , (22)where F ( t ; ˆ θ ) expresses the cumulative distribution function value using esti-mated parameter ˆ θ at time t . Looking at the figure, we see that the prediction7 igure 2: Predicted curves for the number of cumulative infected persons, T ( t ), after 30th,33rd, 45th and 73th day using observed data from the first day to the last observed day viaGLD model (SARS in Honk Kong in 2003). curves for T ( t ) are close to the observed values, although the prediction curvesshow underestimated results to some extent.In addition, the GLD model can predict the final value ˆ T ( ∞ ) = N evenat early stages. It will not make sense that we compare the final value ˆ T ( ∞ )using the GLD model with that using the SIR model because ˆ T ( ∞ ) using theSIR model does not provide consistent values. Therefore, we compare the valueˆ T ( t conv ) using the GLD model with that using the SIR model, where t conv ex-presses the time when ˆ T ( t ) seems to converge. In the SARS case example, we set t conv = 117. We introduce the two terms of ˆ T ( t conv ) GLD ( t ) and ˆ T ( t conv ) SIR ( t );the former represents predicted ˆ T ( t conv ) estimated at truncation time t usingthe GLD model, and the latter ˆ T ( t conv ) at time t using the SIR model.We define L-plot such that time t locates in horizontal axis, and that ˆ T ( t conv )locates in the vertical axis. Figure 3 shows the comparison between ˆ T ( t conv ) GLD ( t )and ˆ T ( t conv ) SIR ( t ) at various time t ; in the figure, observed T ( t ) are superim-posed. Although we have selected a rather small value of 117 for t conv , the esti-mated value of ˆ T (117) SIR ( t ) shows unstable behavior than ˆ T (117) GLD ( t ) does.Therefore, we consider using the information for N using the GLD model, andto combine the GLD model use and the SIR model use next.
4. Combination Method
As described above, we can predict the number of cumulative infected per-sons after the last day of observation using the SIR model under appropriateconditions. However, to keep the estimates reliable, we should pay attentionto provide adequate parameter values, in particular for infection parameter λ igure 3: Comparison of the L-plots between ˆ T (117) GLD ( t ) and ˆ T (117) SIR ( t ) (SARS in HonkKong in 2003). and total population N . Otherwise, even if we can roughly obtain the currentreproduction number R c , we cannot know consistent estimates for λ and N .On the contrary, we do not require the information of N to estimate theparameter θ in the GLD model. In addition, we can estimate N even at earlystages. Thus, in order to obtain the more accurate estimates of the parametersin the SIR model, we may utilize that information from the GLD model. Thisis called the combination method. Using these two models simultaneously, wecan expect to make more accurate predictions. Figure 4 shows the observed COVID-19 case data in Hubei in 2020 (Hubeicase data, (2020)). Looking at the figure, we see that the infection spreads veryquickly, but recovered slowly. From the infection to recover, it took three weeksreferring the median time of infection curve and recovered curve.First, we have fitted the generalized logistic distributions to observed valuesof cumulative number of infected persons, cumulative number of died personsand cumulative number of recovered persons. Figure 5 shows the cumulativedistribution functions and corresponding observed data, i.e., the cumulativenumber of persons are divided by the total number of persons. We can seethat the three observed cases are well fitted to the generalized logistic distribu-tions. According to Hirose, 2007, among the generalized lognormal, generalizedextreme-value, generalized gamma and generalized logistic distributions, thegeneralized logistic distribution showed the best-fit model.To the observed number of infected persons, we have applied the L-plot inFigure 6, assuming that the observed values follow generalized logistic distri-butions. From the figure, we see that the estimates for N seem to stable after9 igure 4: Observed COVID-19 case data in Hubei in 2020. Daily number of infected persons,daily number of died persons, daily number of removed persons, and cumulative numbers ofthem are illustratedFigure 5: Estimated cumulative distribution functions, Cdfs, for infected, died and removedcases, fitted to COVID-19 case data in Hubei in 2020. Solid curves show estimated Cdfs, anddots show the observed values. , N to 70 ,
000 in the SIR model.
Figure 6: L-plot for COVID-19 case data in Hubei in 2020. Observed values are superimposedusing dots.
Figure 7 shows the predicted curves after the last day of observation. Inthe figure, solid curves express the case of N = 70 , N = 1 , ,
000 for the sake of comparison. Clearly, the case of N = 1 , , N = 70 ,
5. Concluding Remarks
Although the SIR model has been actively used for a long time and has beenuseful for prediction, there are uncertainties in predicting the final values for thenumber of infected population and the infectious parameter. In this paper, wehave introduced that the generalized logistic distribution model can be derivedfrom the SIR model under certain conditions. In using the generalized logisticdistribution model, we can resolve one of the uncertainty factors, resulting inthe use of such the information for the numerical computations in the SIRmodel. We have proposed a more accurate and stable prediction methodologyby cooperating these two models with each other. Applications to SARS andCOVID-19 using this combined method are also introduced.
ReferencesReferences [1] Anderson, R. M., May, R., 1991. Infectious diseases of humans: Dynamicsand control, Oxford University Press.11 igure 7: Predicted curves of various cases for the number of cumulative infected personsafter the last day of observation for COVID-19 case data in Hubei in 2020. Observed valuesare superimposed using dots. [2] Anand, N., Sabarinath, A., Geetha S., Somanath, S., 2020. Predicting thespread of Covid-19 using SIR model augmented to incorporate quarantineand testing, Transactions of the Indian National Academy of Engineering5, 141–148.[3] Berge, T., Lubuma, J.M.-S., Moremedi, G.M., Morris, N., Kondera-Shava,R., 2017. A simple mathematical model for ebola in africa, Journal of Bio-logical Dynamics, 11(1), 42–74.[4] Von Bertalanffy, L., 1938. A quantitative theory of organic growth (in-quiries on growth laws. ii), Human Biology, 10(2), 181–213.[5] Bertozzi, A. L., Franco, E., Mohler G., Short, M. B., Sledge D., 2020.The challenges of modeling and forecasting the spread of covid-19, PNAS.117(29), 16732–16738.[6] Chowell, G., Rivas, A., Hengartner, N., Hyman,, Castillo-Chavez, C., 2006.Critical response to post-outbreak vaccination against foot-and-mouth dis-ease, AMS Contemporary Mathematics, 47, 73–87.[7] Dehning, J., Zierenberg1, J., Spitzner, F. P., Wibral, M.,Neto, J. P.,Wilczek, M., Priesemann V., 2020. Inferring change points in the spread ofcovid-19 reveals the effectiveness of interventions, Science, 369 (eabb9789),1–9.[8] Diekmann, O., Heesterbeek, J. A. P., 2000. Mathematical epidemiology ofinfectious diseases: model building, analysis and interpretation, New York:Wiley.[9] Fang, Y., Nie, Y., Penny, M., 2020. Transmission dynamics of the covid-19 outbreak and effectiveness of government interventions: A data-drivenanalysis, J Med Virol., 645–659. 1210] Ferguson, N., Donnelly, C. A., Anderson, R. M. (2001. The foot-and-mouthepidemic in Great Britain: Pattern of spread and impact of interventions,Stochastic Analysis and Applications, 292 1155–1160.[11] Fokas, A. S., Dikaios, N., Kastis, G. A., 2020. Mathematical models anddeep learning for predicting the number of individuals reported to be in-fected with Sars-Cov-2, J. R. Soc. Interface, 17 1–12.[12] Giordano, G., Blanchini, F., Bruno, R., Colaneri, P., Di Filippo, A., DiMatteo, A., Colaneri, M., 2020. Modelling the Covid-19 epidemic and im-plementation of population-wide interventions in Italy, Nature Medicine,26(2) 855–860.[13] Hirose, H., 2007. The mixed trunsored model with applications to Sars,Mathematics and Computers in Simulation, 74(6) 443–453.[14] Hirose, H., 2012. Estimation of the number of failures in the Weibull modelusing the ordinary differential equation, European Journal of OperationalResearch, 223(3), 722–731.[15] Hubei case data, 2020. https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports .[16] Kermack, W. O., McKendrick, A. G., 1933. Contributions to the mathe-matical theory of epidemics-iii. further studies of the problem of endemicity,in: Proceedings of the Royal Society, 94–122.[17] Nelder, J. A., Mead, R., 1965. A simplex method for function minimization,The Computer Journal, 7, 308–313.[18] Richards, F., 1959. A flexible growth function for empirical use, Journal ofExperimental Botany, 10(29), 290–300.[19] SARS case data, 2003.