How many can you infect? Simple (and naive) methods of estimating the reproduction number
H. Susanto, V.R. Tjahjono, A. Hasan, M.F. Kasim, N. Nuraini, E.R.M. Putri, R. Kusdiantara, H. Kurniawan
aa r X i v : . [ q - b i o . P E ] J un How many can you infect?Simple (and naive) methods of estimating thereproduction number
H. Susanto * a,b , V.R. Tjahjono c , A. Hasan d , M.F. Kasim e , N. Nuraini f ,E.R.M. Putri c , R. Kusdiantara f , H. Kurniawan c a Department of Mathematics, College of Arts and Sciences, Khalifa University,PO Box 127788, Abu Dhabi, United Arab Emirates b Department of Mathematical Sciences, University of Essex,Wivenhoe Park, Colchester, CO4 3SQ, United Kingdom c Department of Mathematics, Faculty of Mathematics and Natural Sciences, InstitutTeknologi Sepuluh Nopember, Sukolilo, Surabaya 60111, Indonesia d Center for Unmanned Aircraft Systems Mærsk McKinney Møller Institute,University of Southern Denmark, 5230 Odense, Denmark e Clarendon Laboratory, Department of Physics, University of Oxford,Parks Road, Oxford, United Kingdom f Industrial and Financial Mathematics Research Group, Department of Mathematics,Institut Teknologi Bandung, Ganesha 10, Bandung, 40132, Indonesia
Abstract
This is a pedagogical paper on estimating the number of people that can beinfected by one infectious person during an epidemic outbreak, known as thereproduction number. Knowing the number is crucial for developing policyresponses. There are generally two types of such a number, i.e., basic andeffective (or instantaneous). While basic reproduction number is the averageexpected number of cases directly generated by one case in a population whereall individuals are susceptible, effective reproduction number is the number ofcases generated in the current state of a population. In this paper, we exploitthe deterministic susceptible-infected-removed (SIR) model to estimate themthrough three different numerical approximations. We apply the methods to thepandemic COVID-19 in Italy to provide insights into the spread of the disease inthe country. We see that the effect of the national lockdown in slowing down thedisease exponential growth appeared about two weeks after the implementationdate. We also discuss available improvements to the simple (and naive) methodsthat have been made by researchers in the field.Authors of this paper are members of the SimcovID (Simulasi dan PemodelanCOVID-19 Indonesia) collaboration.
Keywords:
Reproduction number, infectious disease, compartment model,and COVID-19 ✩ Corresponding author. Email Address: [email protected] (H. Susanto). . Introduction ”When will the peak of the pandemic hit? When will it be over?”
Those are unarguably among the most asked questions during the ongoingcoronavirus disease 2019 (COVID-19) crisis, i.e., a disease outbreak of atypicalpneumonia that originated from Wuhan, China [1]. The disease spread to over100 countries in a matter of weeks [2], with most internationally imported casesreported to date having history of travel to Wuhan [3, 4]. The pandemic hasmade governments all over the world take serious responses [5]. The govern-mental measures result in a significant disruption in the lives of their peoplethat raised such questions above.Diseases grow rather exponentially at the initial transmissibility of outbreak[6, 7]. When there is no intervention and the proportion of infections startsto become comparable to the entire population, the growth will slow down assusceptible is fuel to diseases. This type of logistic growth will yield the peak ofa pandemic that everybody is interested in and its arrival can be forecast using,e.g., the susceptible-infected-removed (SIR) compartment model [8].However, in the presence of pandemic, human beings adapt. Governmentsintervene. As such, using the SIR model to predict the peak, while new casesare mainly outcomes of national policies and/or community behaviour, would besimilar to forecasting what policymakers would do or the effectiveness of theirresponse [9, 10, 11], which is dynamic and can be unprecedented. On top ofthat, there is a lack of knowledge of epidemiology characteristics and a high rateof undocumented cases [12]. A brute force analysis by fitting reported data tothe SIR model is therefore prone to a false prediction if not done carefully (see,e.g., Fig. 2 of [13] that incorrectly predicted the peak time as well as the totalinfection of COVID-19 in Italy when compared to the latest data). We will showbelow how data-driven forecasts are sensitive to the time-series information thatwe input in the model.Considering the limitations and obstacles, it is therefore important to deter-mine instead the so-called disease reproduction number or reproductive factor[14], which is the number of people that are infected by one infectious personduring an epidemic outbreak [15, 16, 17, 18, 19]. It depends on the duration ofthe infectious period, the probability of infecting a susceptible individual duringone contact, and the number of susceptible people contacted per unit of time.There are generally two types of such a number, i.e., basic [15] and effective(or instantaneous) [16]. While basic reproduction number is the average ex-pected number of cases directly generated by one case in a population where allindividuals are susceptible, effective reproduction number is the number of casesgenerated in the current state of a population. This paper is intended to give abrief review of these numbers to undergraduate students and a broad science-educated audience in general. We also hope that the paper can be an expositoryarticle of epidemiology to policyholders in making public health measures.2o quantify directly the actual reproduction number is difficult, if not im-possible, and as such, we can only estimate it indirectly. One common approachis to fit a model to epidemiological data that will provide values of some pa-rameters [6]. Here, we use the SIR compartment model as our model reference,where the reproduction number is associated to the threshold point for stabilityof the disease free equilibrium.There are three estimation methods that we will discuss. As a case study,we apply the methods to discuss and forecast viral transmission of COVID-19 in Italy. The first one is by parameter fit to the SIR model [20], whichis probably the most popular analysis to the study of COVID-19 [13]. Thecomputed parameters will then be used to obtain the reproduction number.The second method is to use the reported data of infected and removed people[21]. Comparing the number with that obtained using the parameter fit showsa similar trend in the decrease of the infection rate in Italy. The third methodis using the ratio of increment of infections from two subsequent days [22, 23].However, such a quantity is usually highly fluctuating as we demonstrate for thecase of Italy. The trend is obtained using, e.g., a parameter fit of the Richardscurve [6, 24] to the cumulative cases.As the methods presented here are all based on the SIR model, they arelimited by assumptions commonly made within the SIR model. An importantassumption is that the presented data are an accurate representation of whathappens in the population, although this can be relaxed for some methods inthis paper. Another assumption or limitation is that it does not include peoplethat are infected but not infectious, which can be overcome by incorporatinganother compartment, such as the commonly used Exposed group.We conclude the paper with a brief review of improvements to the methodsby including randomness (stochastic processes).
2. SIR model and the reproduction number
The SIR model equations are given by dSdt = − β SIN , (1) dIdt = β SIN − γI, (2) dRdt = γI. (3)Here, S and I denote the total number of susceptible and infected individuals.Variable R represents the removed compartment that can consist of recovered(and become-resistant) and deceased individuals. The total population is N = S + I + R . Note that dN/dt = d ( S + I + R ) /dt = 0, which implies that N is constant. The parameters β and γ are the transmission and removalrate constants, respectively. The average length of time an infected individualremains infective, i.e., the infectious time, is 1 /γ . Note that this still applies3ven when the parameters β and γ are functions of time. Additionally, wedenote the cumulative (total) case as T = I + R , which satisfies the equation dTdt = β SIN . (4)Equation (2) can also be written as dIdt = γ ( R t − I, (5)where R t = SN R , R = β/γ. (6) R t is the effective reproduction number and R is the basic one. Note from (5)that depending on the value of R t , i.e., whether R t > R t <
1, the infections I will increase or decrease in time, respectively. It is therefore important to trackthis number to forecast the spread of an infection in an area.As data are collected and reported regularly in a certain time interval, it isinstructive to consider instead the discrete model∆ S n = − τ β S n I n N , (7)∆ I n = τ β S n I n N − τ γI n , (8)∆ R n = τ γI n , (9)where ∆ K n = K n +1 − K n , K = S, I, R . τ is the time interval, which in thelimit τ →
0, make the model (7)-(9) become (1)-(3). We take τ = 1 day, whichis the standard time interval to report updates on cases for COVID-19. Thereproduction numbers (6) can be checked to be still applicable here. In thefollowing, we will mainly use the discrete SIR model (7)-(9).As the main part of this report, we will consider three different methods toapproximate the effective reproduction number R t . To calculate R t (6), one needs to determine first the parameters β and γ , aswell as the number of susceptible S n and hence the population size N . Becauseinfection data are given in terms of the number of infected and removed (i.e.,recovered or deceased) people, we can find the parameter set for which the modelhas the best agreement with the data. In that case, we fit the deterministicepidemiological model (7)-(9) by employing a generalized least squares scheme,i.e., we search for the minimum of an unconstrained problem specified bymin { S ,β,γ } X n ( I n − Idata n ) + ( R n − Rdata n ) , (10)where Idata n and Rdata n are reported infected and removed cases at day n .Here we only limit ourselves to minimization using three parameters ( S , β ,4 eb Mar Apr May Jun 2020 024681012 P eop l e ( C a s e s ) L O CKD O W N T w o w ee ks a ft e r l o ck do w n ActualFit day 1-53Predicition based Fit day 1-53Fit day 54-87Predicition based Fit day 54-87Fit day 1-87Prediction based Fit day 1-87 (a)
Feb Mar Apr May Jun 2020 024681012 P eop l e ( C a s e s ) L O CKD O W N T w o w ee ks a ft e r l o ck do w n ActualFit day 1-53Predicition based Fit day 1-53Fit day 54-87Predicition based Fit day 54-87Fit day 1-87Prediction based Fit day 1-87 (b)
Feb Mar Apr May Jun 2020 0123456 P eop l e ( C a s e s ) L O CKD O W N T w o w ee ks a ft e r l o ck do w n ActualFit day 1-53Predicition based Fit day 1-53Fit day 54-87Predicition based Fit day 54-87Fit day 1-87Prediction based Fit day 1-87 (c)Figure 1: Predicted evolution of the COVID-19 outbreak in Italy based on the fits of thediscrete SIR model (7)-(9) for (a) active cases; (b) recovered; (c) deceased. Shaded regionsrepresent the official data retrieved from the JHU CSSE repository [25]. There are threedifferent predicted trends, based on the length of the fitted data, see the legends. Day 1 = 31Jan 2020, Day 39 (national lockdown) = 9 Mar 2020, Day 87 = 26 Apr 2020. S β γ (1) γ (2) Table 1: Parameters obtained from the minimization procedure. and γ ) only. Note that N is implicitly part of the estimated parameters because N = S + I + R . We take I = Idata and R = Rdata at the initial step.The search is done using fminsearch function of Matlab that implementsthe Simplex search method. To illustrate our computation, we consider COVID-19 cases in Italy, which was one of the world’s worst-hit countries. Data wereretrieved from [25] on 26 April 2020, which in the analysis will be denoted asDay 87 (i.e., Day 1 is 31 Jan 2020). We present in Fig. 1 the reported dataand the fitting SIR dynamics. We slightly modify the model by splitting theremoved compartment ( R ) into recovered and deceased ones with rates γ (1) and γ (2) respectively and as such, γ = γ (1) + γ (2) .From our computations, using all the available data, i.e., Day 1-87, to befitted into the SIR equations yields rather bad agreement. It turned out tobe necessary to split the data into two periods, separated around the nationallockdown that was implemented on 9 March 2020 (Day 39). To be precise, it isthe threshold date of the lockdown effect that manifested two weeks later, i.e.,from Day 53.The split is to incorporate the intervention and behavioural change of thepopulation in the model that requires the parameter values to vary over time.Note that the SIR model assumes the parameter values to be constant during thefitted period, which is not necessarily correct. Assuming constant value of thoseparameters implicitly assumes that the decline in active cases is because herdimmunity (i.e., substantial decline in susceptible population) has been formed,which has not been detected anywhere, even at places with high death counts.Splitting the graph and fitting the parameters separately are therefore to solvethe assumption violation, where an extra care must be taken in the procedure.Using the splitting, we obtain good agreement as can be seen in Fig. 1. Itis important to note that using data from Day 1-53, we obtained a predictedpeak at the end of March 2020, which clearly is not correct, i.e., parameter fitsdepend sensitively on the fitted data. This explains the incorrect prediction of[13].In Table 1, we list the fitting parameters. Using the values, we plot in Fig.2 the resulting estimated effective reproduction number R t (6). It is clear thatthe national lockdown effectively decreased the number. The curve crosses theaxis R t = 1 at the peak on 19 Apr 2020.6 an 31 Feb 14 Feb 28 Mar 13 Mar 27 Apr 10 Apr 242020 0510 Figure 2: Estimated effective reproduction number using Method 1 (dashed lines) and Method2 (stars). The lines are necessarily split into two parts following the national lockdown, seethe text.
In the second method, instead of getting the parameters β and γ from fitting,we will derive them from the governing equations (8)-(9) directly. Writing β ≡ β n and γ ≡ γ n on the right hand side of the equations, it is straightforward toobtain β n = ∆( I n + R n ) τ S n I n N, γ n = ∆ R n τ I n . (11)From the definition (6), we have R = β n /γ n and as such, R t = S n N R = 1 + ∆ I n ∆ R n . (12)We therefore obtain that the effective reproduction number is related to the ratiobetween the change of the infected and the removed compartments. Because∆ R n >
0, then R t < I n < The third method is to exploit the daily reported new cases, which in termsof the SIR model will be given by the daily difference of the cumulative cases∆ T n . Integrating (2) in time between t and ( t + τ ) gives us I n +1 = I n e γ R t + τt ( R t − dt ≃ I n b ( R t ) , (13)7 eb 01 Feb 15 Feb 29 Mar 14 Mar 28 Apr 11 Apr 252020 01234 (a) Feb Mar Apr May2020 00.511.52 10 data Richards fit (b)Figure 3: (a) Plot of b ( R t ) in time from the reported data (stars) and the approximationobtained using Richards’ curve (dashed line). (b) Cumulative cases from data (stars) andRichards’ approximation (dashed line). b ( R t ) = e γτ ( R t − . (14)Here, we denote I ( t + τ ) = I n +1 and I ( t ) = I n . In the last equation, we haveassumed that R t is constant within the time interval.On the other hand, we have from (4) a discrete approximation∆ T n = τ βS n +1 I n +1 /N = τ βS n +1 I n b ( R t ) /N ≃ τ βS n I n b ( R t ) /N. (15)The last step is expected to be valid for emerging diseases, i.e., I varies slowly.At the same time, we also have from (4)∆ T n − = τ βS n I n /N. (16)Combining (15) and (16) gives us the effective reproduction number b ( R t ) = ∆ T n / ∆ T n − , R t = 1 + 1 τ γ ln ( b ( R t )) . (17)Because b ( R t ) is a monotonically increasing function in R t , it can be enough toplot b itself to determine whether the disease decreases or not.In Fig. 3(a) we plot b ( R t ) from the data of COVID-19 cumulative casesin Italy shown in stars. However, the curve is highly fluctuating that mayhide the trend. As mentioned in Method 2 above, this could also be solvedby smoothing the data using a moving average filter. Another possible way isby approximating the reported cumulative cases with a continuous function. Anatural candidate is certainly the generalised logistic function, also known asRichards’ curve, T n = A (cid:0) B + e − C ( n − n ) (cid:1) /ν . (18)Again using a least square method to fit the reported cumulative cases to thefunction, we obtain that the best parameter values are A = 119520, B = 0 . C = 0 . n = 0 . ν = 0 . b ( R t ) calculatedfrom the reported data.
3. Conclusion
We have presented three simple (or actually simplistic) methods to estimatethe reproduction number of the COVID-19 pandemic based on the SIR equationsas the underlying model. We applied the methods to the data of COVID-19 casesin Italy, where we saw that the implemented national lockdown had positiveimpacts that appeared about two weeks later.To extend the deterministic methods reviewed herein, one may consider com-plex models that include more compartments [6, 26]. However, to be more9ealistic, one should include statistical randomness and probability in the cal-culations.In the spirit of Method 1, Cintr´on-Arias et al. [20] combined parameterfits with statistical asymptotic theory and sensitivity analysis to give approxi-mate sampling distributions for the estimated parameters. Method 3 has beenimproved in [22, 23] to include a probabilistic description such that the prob-abilistic formulation for future cases is equivalent, via Bayes theorem, to theestimation of the probability distribution for the reproduction number.In addition to estimating the reproduction number based on a model, it isalso possible to approximate the reproduction number from the serial interval(the time between the onset of symptoms in a primary case and the onset ofthose in secondary cases) without assuming a model [27, 28, 29].
Acknowledgement
H.S. is extremely grateful to his wife, dr. Nurismawati Machfira, who hashappily taken a new additional job as ’head teacher’ of their children at homeduring school closure, while maintaining her job as their primary carer, so thathe could still
References [1] The 2019-nCoV Outbreak Joint Field Epidemiology Investigation Team,Qun Li. An Outbreak of NCIP (2019-nCoV) Infection in China - Wuhan,Hubei Province, 2019-2020. China CDC Weekly 2(5), 79-80 (2020).[2] E. Callaway. Time to use the p-word? Coronavirus enter dangerous newphase. Nature 579, 12 (2020).[3] J.T. Wu, K. Leung, and G.M. Leung. Nowcasting and forecasting thepotential domestic and international spread of the 2019-nCoV outbreakoriginating in Wuhan, China: a modelling study. The Lancet 395(10225),689-697 (2020).[4] A.J. Kucharski, T.W. Russell, C. Diamond, Y. Liu, J. Edmunds, S. Funk,and R.M. Eggo. Early dynamics of transmission and control of COVID-19: a mathematical modelling study. The Lancet Infectious Diseases 20(5),553-558 (2020).[5] ‘National responses to the COVID-19 pandemic’ (2020) Wikipedia.Available at: https://en.wikipedia.org/wiki/National_responses_to_the_COVID-19_pandemic (Accessed: 8 May 2020).[6] J. Ma. Estimating epidemic exponential growth rate and basic reproduc-tion number. Infectious Disease Modelling 5, 129-141 (2020).107] G. Chowell and C. Viboud. Is it growing exponentially fast? Impact ofassuming exponential growth for characterizing and forecasting epidemicswith initial near-exponential growth dynamics. Infectious Disease Mod-elling 1(1), 71-8 (2016).[8] SK. Taslim Ali. A study on stochastic epidemic models with the opti-mal control policies. PhD Thesis, Karnatak University, 2014. Available at http://hdl.handle.net/10603/98827 [9] A.B. Gumel, S. Ruan, T. Day, J. Watmough, F. Brauer, P. Van denDriessche, D. Gabrielson, C. Bowman, M.E. Alexander, S. Ardal, andJ. Wu. Modelling strategies for controlling SARS outbreaks. Proceedingsof the Royal Society of London. Series B: Biological Sciences 271(1554),2223-2232 (2004).[10] Z. Yang, Z. Zeng, K. Wang, S.S. Wong, W. Liang, M. Zanin, P. Liu, X.Cao, Z. Gao, Z. Mai, and J. Liang. Modified SEIR and AI prediction of theepidemics trend of COVID-19 in China under public health interventions.Journal of Thoracic Disease 12(3), 165 (2020).[11] Y. Fang, Y. Nie, M. Penny. Transmission dynamics of the COVID19 out-break and effectiveness of government interventions: A datadriven analy-sis. Journal of Medical Virology 92(6), 645-59 (2020).[12] R. Li, S. Pei, B. Chen, Y. Song, T. Zhang, W. Yang, and J. Shaman.Substantial undocumented infection facilitates the rapid dissemination ofnovel coronavirus (SARS-CoV-2). Science 368(6490), 489-493 (2020).[13] D. Fanelli and F. Piazza. Analysis and forecast of COVID-19 spreading inChina, Italy and France. Chaos, Solitons & Fractals 134:109761 (2020).[14] K. Dietz. The estimation of the basic reproduction number for infectiousdiseases. Statistical Methods in Medical Research 2(1), 23-41 (1993).[15] G. Chowell and F. Brauer. ”The Basic Reproduction Number of InfectiousDiseases: Computation and Estimation Using Compartmental EpidemicModels.” In Mathematical and Statistical Estimation Approaches in Epi-demiology, pp. 1-30. Springer, Dordrecht, 2009.[16] H. Nishiura and G. Chowell. ”The effective reproduction number as aprelude to statistical estimation of time-dependent epidemic trends.” InMathematical and Statistical Estimation Approaches in Epidemiology, pp.103-121. Springer, Dordrecht, 2009.[17] P. van den Driessche. Reproduction numbers of infectious disease models.Infectious Disease Modelling 2(3), 288303 (2017).[18] B. Ridenhour, J.M. Kowalik, and D.K. Shay. Unraveling R0: Considera-tions for Public Health Applications. American Journal of Public Health108(S6), S445-S454 (2018). 1119] P.L. Delamater, E.J. Street, T.F. Leslie, Y. Yang, and K.H. Jacobsen.Complexity of the Basic Reproduction Number (R0). Emerging InfectiousDiseases 25(1), 1-4 (2019).[20] A. Cintr´on-Arias, C. Castillo-Ch´avez, L.M.A. Bettencourt, A.L. Lloyd,and H.T. Banks, The estimation of the effective reproductive numberfrom disease outbreak data, Mathematical Biosciences and Engineering6, 261283 (2009).[21] Y-C. Chen, P-E. Lu, C-S. Chang, and T-H. Liu. A time-dependent SIR model for COVID-19 with undetectable infected persons.arXiv:2003.00122 [q-bio.PE][22] L.M.A. Bettencourt and R.M. Ribeiro (2008) Real Time Bayesian Esti-mation of the Epidemic Potential of Emerging Infectious Diseases. PLoSONE 3(5): e2185.[23] G. Chowell, H. Nishiura, and L.M.A. Bettencourt. Comparative estima-tion of the reproduction number for pandemic influenza from daily casenotification data. Journal of The Royal Society Interface 4, 155166 (2007).[24] N. Nuraini, K. Khairudin, and M. Apri. Modeling Simulation of COVID-19 in Indonesia based on Early Endemic Data. Communication inBiomathematical Sciences 3(1), 1-8 (2020).[25] Novel coronavirus (COVID-19) cases, provided by Johns Hopkins Uni-versity Center for Systems Science and Engineering (JHU CCSE). https://github.com/CSSEGISandData/COVID-19 [26] J. Li, D. Blakeley, and R.J. Smith?. The Failure of R0