An empirical model on the dynamics of Covid-19 spread in human population
AAn empirical model on the dynamics of Covid-19 spreadin human population
Nilmani Mathur ∗ Department of Theoretical Physics,Tata Institute of Fundamental Research,1 Homi Bhabha Road,Mumbai 400005, India
Gargi Shaw † Department of Astronomy and Astrophysics,Tata Institute of Fundamental Research,1 Homi Bhabha Road, Mumbai 400005, India
Abstract
We propose a mathematical model to analyze the time evolution of the total number of infectedpopulation with Covid-19 disease at a region in the ongoing pandemic. Using the available dataof Covid-19 infected population on various countries we formulate a model which can successfullytrack the time evolution from early days to the saturation period in a given wave of this infectiousdisease. It involves a set of effective parameters which can be extracted from the available data.Using those parameters the future trajectories of the disease spread can also be projected. A set ofdifferential equations is also proposed whose solutions are these time evolution trajectories. Usingsuch a formalism we project the future time evolution trajectories of infection spread for a numberof countries where the Covid-19 infection is still rapidly rising. ∗ Electronic address: [email protected] † Electronic address: [email protected] a r X i v : . [ q - b i o . P E ] A ug . INTRODUCTION Currently a pandemic is ongoing throughout the world caused by a contagious respiratorydisease, called Covid-19. The pathogen of this respiratory disease is a novel coronavirus,named SARS-CoV-2 [1]. It started from China and subsequently spreads to most of thecountries, and as of August 12, 2020, it has infected more than 20.2 million human populationworldwide causing more than 740 thousand deaths [1–3]. Though the spread of infectionhas substantially reduced in several countries, particularly in China and Europe, in manycountries with a large population, such as USA, Brazil and India, the pandemic is surgingprominently at this time. It is also not clear whether this respiratory disease will be seasonaland a second wave will come later. There is no clear consensus in the scientific communityon the possible future evolution of this disease and naturally there is no consensus on theideal intervention strategy by a government in minimizing the number of fatalities whileallowing economic and social activities. Many mathematical models have been put forward[4–32] to track the time evolution of the disease spread, to understand its dynamics, as wellas to provide a feasible guidance to governments around the world to control this pandemic.As in ecology and population growth, in epidemiology too the time evolution of virusgrowth in a population is of fundamental importance. In general, one builds a model ofdisease transmission using a system of differential equations with an assumption of initialexponential growth [35–45]. A standard way for mathematical analysis of the dynamics ofinfection spread is to adopt a variety of compartmental models originated primarily from theso-called SIR model [46–48], incorporating susceptible (S), infectious (I) and Recovered (R)population. For the Covid-19 disease growth, one of these models, the so-called SEIR model,and its extensions have been utilized extensively [5–23, 31]. In this scheme of models, onedivides the total population, N , at a infected region (e.g., city, country), into susceptible ( S ),exposed ( E ), Infected ( I ) and Recovered ( R ) population with a constraint S + E + I + R = N .A set of differential equations incorporating these correlated compartments then providesthe time evolution of disease spread. The success and problem of this model have beendiscussed over last few months in detail and there is no clear consensus whether this typeof models can predict the spread of Covid-19 with reasonable accuracy [33, 34]. There is References are not at all exhaustive. There are many other similar articles in medRxiv and bioRxiv: https://connect.biorxiv.org/relate/content/181
2n alternate view that the integral equations based models could be more effective thanthe models based on differential equations mentioned above to describe the dynamics ofepidemics [30, 49, 50].Another approach in studying the dynamics of an epidemic is to employ data-drivenphenomenological statistical models [24–29] where one constructs a mathematical descriptionutilizing the existing data on the epidemics. For example, the numbers of total infections,day to day infections, fatalities etc. can be utilized to construct a model with a number ofparameters and then constrain those parameters with the data. Of course, these models donot include any microscopic parameters but can describe the data in an effective way. Oncethe model parameters are fixed, in principle, it is possible to project the future dynamicsof the infection spread.In this work, adopting the phenomenological approach of the statistical models, we pro-pose a mathematical model for the time evolution of the number of infected population.Rather than proposing a microscopic model, by analyzing the available data on variouscountries and cities, we develop an effective time evolution trajectory on the number of totalinfection which can be employed at Covid-19-affected regions. The reason behind adoptingsuch an approach is that since the Covid-19 disease is contagious, assuming a region as aclosed system, the number of today’s infected population is directly correlated to the num-ber of infected population in the past, and moreover, today’s number will also determinehow many people will be infected in the near future. The parameters of the model can bethought of as an effective mean-field type parameters, which can be constrained with theavailable data, and later those can be employed for the future time evolution of the dis-ease spread. We find that there is a clear common pattern in the initial growth, mitigationand saturation periods of the time evolution of the number of infected population at vari-ous Covid-19-affected regions. The only difference in describing the Covid-19 virus spreadbetween various affected population is the difference between parameter sets of these re-gions. However, they are all confined within a smaller subspace of the parameter space. Thedifferences between parameters of different regions are possibly due to their differences intotal number of population, density, mobility, age and gender distributions, testing facility,lockdown effect, social distancing etc. microscopic factors. Of course, it will be interest- Here projection means prediction within a set of assumptions [51]. χ minimizationprocedure to incorporate these errors and this also helps to get a band of trajectories aroundits mean values (reported numbers). In this way, the onset of saturation period of infectioncan also be span over a few days (or weeks), and not on a single day which is also what weobserve at various affected regions.In this work, our main objective is to build a model, utilizing the available data onCovid-19 spread of various regions, which can track the time evolution of the number ofinfected population, and also to project the possible future trajectory. As it is data-driven,the dynamics of our model and its predictive power is dependent on the correct source ofdata, and hence the model parameters and projection can change if the data is erroneous.We use data mainly from the coronavirus resources of Wikipedia [3], W.H.O. [1] as wellas of local governments. We assume each region as a closed system and all conditions4uring the disease spread more or less remain to be the same so that a time correlationof infection can be built. If the prevailing measures against the disease spread, such aseffectiveness of lockdown, social distancing, contact tracing and quarantine, preventive mask-wearing, forbidding large public gatherings etc., under which the data were available, changesubstantially then the parameters and hence the trajectories will also change. However, thismodel can be progressively improved with more data, and projection for a few weeks to amonth or more can also be made.We organize the article as below. In section II, we detail our model and also elaborateon the set of differential equations corresponding to this model. In section III, we provideresults with numerical details. First, we validate the model by analyzing data on variousEuropean countries and New York City. Then to demonstrate the predictive ability of thismodel we show how a subset of data can help to predict the future time evolution trajectoryof the infection at a region. Next, we proceed to analyze data on Russia, Brazil, Indiaand the USA where Covid-19 infections are still increasing rapidly. For India we separatelyanalyze the data for its two biggest cities: Mumbai and Delhi. For each of the regions,within this model, we show the projections of the future time evolution of the infection withthe most probable time-scale for the onset of saturation along with the cumulative numberof infection. At the end, we discuss our results and conclude. II. A DYNAMICAL MODEL FOR COVID-19 SPREAD
The basis of building a phenomenological data-driven model is to analyze the availabledata on the targeted problem and formulate a mathematical framework to represent thedata, with a minimal set of parameters, in a consistent plausible way. This mathematicalmodel can also predict (project) the dynamics in a domain where data is not available.With this in mind we analyze the number of cumulative infected population of Covid-19disease for a number of countries and cities. In Fig. 1 we represent the available data in afew possible ways. In the top plots we show the cumulative number of infected populations( N ( t )) as a function of time (Days(t)), in linear and log scales respectively. The bottomleft plot shows the same data when we normalize the infected population of a region by itstotal population, whereas the bottom right plot shows the number of infection per day as afunction of day (log-scale is used to show all data in one plot so that they can be compared5ogether).
0 50 100 150 200 C on f i r m e d ca s e s ( N ( t )) Days (t)
England:France:Germany:Italy:Spain:NYC:Russia:Brazil:India:Mumbai:Delhi:USA:
0 20 40 60 80 100 120 140 160 C on f i r m e d ca s e s ( N ( t )) Days (t)
England:France:Germany:Italy:Spain:NYC: Russia:Brazil:India:Mumbai:Delhi:USA: N o r m a li ze d ca s e s ( N ( t ) / N P ) Days (t)
England:France:Germany:Italy:Spain:NYC:Russia:Brazil:India:Mumbai:Delhi:USA: P e r d a y i n c r e m e n t ( d N ( t ) / d t ) Days (t)
England:France:Germany:Italy:Spain:NYC:Russia:Brazil:India:Mumbai:Delhi:USA:
FIG. 1: The time-evolution trajectories of Covid-19-infected population for various countries andcities are shown. The top left plot shows the cumulative number of infected population ( N ( t )) inlinear-scale while the top left one represents the same in log-scale. The bottom left plot representsdata normalized by the total population ( N p ) of the affected regions. The bottom right plotshows the number of infected population per day. Data (till July 31,2020) are obtained from Refs.[1, 3, 56–69]. By observing the progress of Covid-19 disease spread at various countries, as shown inFig. 1, we find that the time evolution of cumulative infected population shows more or lessa common pattern, irrespective of the inherent different conditions prevailing at the affectedregions. We find that the total infection time can be divided broadly within the followingthree periods: • an early increment period ( t ≤ t ≤ t m ) • a mitigation period ( t m < t < t s i ) 6 a decline or saturation period ( t s i ≤ t ≤ t s f ).Here t , t m , t s i and t s f are the starting times of infection, mitigation, saturation and theending time of the infection, respectively. Of course, the transitions from one period to theother are not sharps, rather those are cross-over between two regions and can span over acouple of days or weeks. For example, for the latent factor, t may not be a particular day.Similarly t m , t s i , t s f may also span over a couple of days or weeks. Hence, if there are differentmathematical forms for describing the time evolution at different periods, there should bea continuity of the functional forms from one period to the others as will be explainedlater. The transition times from one period to other can be determined through the data,for example, by minimizing the χ of a particular model against the data. Naturally, thetransition times will be different for different countries since the conditions responsible forearly infection rate, decrement of that rate in the mitigation period, onset of the saturationperiod are different for different countries.To show the above findings more conclusively, as a representative case, in Fig. 2, we showthe time evolution trajectory of the infected population of England with top plot showingthe cumulative infected population and the bottom plot representing the per day incrementof infection as a function of days. The above-mentioned three periods are differentiated withdifferent colors mentioning the transition times. These transition times are calculated byfitting our model with available data with a minimum acceptable χ which we will discusslater in detail. For England, with fixing the initial days of infection ( t ) as February 24,2020, we find that the transition time from early rise to the mitigation period started ataround t = 47, and the onset of saturation period started at around t = 93 days. We willdiscuss on this in more detail in section II when all other results will be presented.By analyzing the available data for various countries, we find empirically that a possibleway to track the trajectory of the number of infected population, within the above-mentionedthree periods, starting from the early days ( t ) to the saturation period ( t s f ), is through thefollowing equations: N ( t ) = f i ( t ) t ≤ t ≤ t m ,f m ( t ) t m < t < t s i ,f s ( t ) t s i ≤ t ≤ t s f , (1a)(1b)(1c)where the functional form of the fast increment period could be an exponential rise followed7
0 20 40 60 80 100 120 140 160
England t e t m t s i t s f C on f i r m e d ca s e s ( N ( t )) Days (t) exponential rise:power-law rise:mitigation:saturation:
England t e t m t s i t s f I n f ec ti on p e r d a y ( d N ( t ) / d t ) Days (t) exponential rise:power-law rise:mitigation:saturation:
FIG. 2: The time-evolution trajectory of Covid-19 infected population of England. The top plotrepresents the cumulative number of infection as a function of days while the bottom plot showsthe infection per day. Data are taken from Ref. [56]. by a power-law-rise of the form: f i ( t ) = (cid:2) Ae α ei ( t − t ) (cid:3) t ≤ t ≤ t e + [ B + Ct α i ] t e < t ≤ t m , (2)and in some cases, simply be the following power-rise form f i ( t ) = [ B + Ct α i ] t ≤ t ≤ t m . (3)Here A, B, C are constants and mainly depend on the latent population and the density ofpopulation, while α ei and α i govern the rate of infection in a given population. In reality, α ei and α i are time-dependent variables and depend on many factors, such as the density of8opulation, the total population, effectiveness of the preventive measures against the diseasespread. In this study we assume that these variables can be taken as constants in the senseof an effective mean-field approximation: α ei ( t ) = ˜ α ei and α i ( t ) = ˜ α i . However, if there arerapid mutations of the pathogen, migration between different population in a short intervalas well as rapid change in environmental factors, these assumptions may not be valid. Fora given population, depending on its density, immunity in population, social distancing orany other preventing factor against the disease spread, one of the above forms enhances theinfection to a large number in a short period of time. By tracking the infected populationin the early rise period one can fix one of these forms to explain the rise of infected ofpopulation. Such power-law-rise form was also observed in Refs. [24].In the second period (so-called mitigation period with t m ≤ t < t s i ), we find that thetime evolution trajectory for the total number of infected population can be tracked with acombination of rising and damping factors through the following equation: f m ( t ) = (cid:2) Dt α m e − λ ( t − t m ) γ (cid:3) t m < t < t si (4)where, α m is related to the rate of infection in this period, while the parameters λ and γ determine how fast the saturation period can be reached. The decrease of early infectionrate may happen for various reasons, for example for government imposed preventive mea-sures, development of immunity in the community, availability of drugs etc. Eventually theinfection rate saturates and then decreases depending on the effectiveness of these externalfactors.For a given wave of infection, we observe that the increment of infection in the satura-tion/decline period ( t s i ≤ t ≤ t s f ) can be tracked through the following equation: f s ( t ) = N ( t s i ) + E (cid:2) − e − α s ( t − t si ) (cid:3) , (5)where E is a constant, α s is the infection rate in this period, and these constants dependon the population density, herd immunity factor, availability of remedy through drugs etc.However, in this period, at any time, a second wave may start if there is a large unaffectedpopulation and the restrictions to contain the virus become much softer. If the virus isseasonal, on which there is no consensus yet, it can also come back to unaffected population.In the case of USA, such a second rise of infection by this virus is quite prominent (andprobably also for Spain, albeit slowly) which we will discuss later.9o be noted that rather than using a single mathematical formula for tracking the in-fection throughout the contamination period, from the early rise to the saturation period,as used in [24, 28, 29], we use three different functional forms. The reason behind sucha formulation is that the dynamics of disease spread in three periods are different due tochange in various external conditions as infection progresses (this is quite apparent in Fig.2). A single functional form, as used in Refs. [24, 28, 29] is thus not suitable to follow thetime evolution dynamics throughout the contamination period. As mentioned earlier, in ourcase, at the boundaries the model-evaluated trajectories obtained from different functionalforms are matched between two periods so that no discontinuity arises.Though these data-driven mathematical trajectories can be effective to project the futureevolution of the disease, it is important to find out a set of differential equations whosesolution are these equations. That may help to understand the dynamics of infection in abetter way as well as the origin of such equations and their parameters. To achieve such aformulation it is worthwhile to look towards the growth equations in ecology and epidemicstudies. In fact in epidemic studies, to estimate the infection rate, cumulative number ofcases, the peak number of infected population, future dynamics of the epidemics etc., onecan use the growth equations. One such approach is the standard logistic model [52] wherepopulation growth is determined through an exponential growth, constrained to the numberof population, as below: dNdt = αN ( t ) (cid:18) − N ( t ) K (cid:19) , (6)where the first term αN ( α >
0) implies an initial exponential growth with an exponent α ,while the second term, known as the bottleneck factor with the parameter K , the carryingcapacity , adjusts the growth with critical resources in the population. A situation withmaturity of population arrives as the competition between two terms reduces the combinedgrowth rate, until the infected population saturates. Since the environmental conditionsinfluence the carrying capacity ( K ), as a consequence it can be time-varying ( K ( t ) > dNdt = αN ( t ) (cid:18) − N ( t ) K ( t ) (cid:19) . (7)This can be associated with a more general growth model, the so-called generalized RichardsModel [53] which was originally introduced in the context of ecological population growth,10s below: dNdt = αN ( t ) (cid:20) − (cid:18) N ( t ) K ( t ) (cid:19) ν (cid:21) , (8)where the additional parameter ν > K is independent of time) N ( t ) = K (1 + νe − αν ( t − t tp ) ) /ν (9)with t tp as the turning point where the growth rate becomes maximum. This model hasalready been employed for real-time prediction in epidemiology, for example in Refs. [54, 55].At this point, we would like to correlate our observed empirical growth model (Eq.(1)),with Richards Model (Eq.(8)). However, once a government introduces a measure, suchas lockdown and/or introduction of preventive drugs, to reduce the disease spread, themitigation period also reduces. If one considers a time-dependent log-kill factor, c ( t ) N ( t ),as in reference [24], the above equation modifies to dN ( t ) dt = αN ( t ) (cid:20) − (cid:18) N ( t ) K ( t ) (cid:19) ν (cid:21) − c ( t ) N ( t ) . (10)If Eq.(1b) is a theoretical model for infectious growth with preventive measures againstinfection spread, and Eq.(10) can also explain the same growth, then Eq.(1b) could well bea solution of Eq.(10). Following the similar strategy as in Ref. [24], we find that for t > t ,Eq.(1b) is a solution for Eq.(10) with the following set of parameters: ν = 1; α = γ ( γ − λt m ; K ( t ) = 2 Dt m t α m +1 ;and , c ( t ) = λγt γ − α m t − λ ( γ − t m e − λ ( t − t m ) γ t + λγ ( γ − t m (1 − t γ − ) . (11)With this parameterization, we arrive at the following dynamical evolution equation of theinfection growth for Covid-19 with preventive measures (such as lockdown, social distancing,introduction of preventive drugs etc.) dNdt = αN ( t ) (cid:18) − N ( t ) βt α m +1 (cid:19) − (cid:20) λγt γ − α m t − λ ( γ − t m e − λ ( t − t m ) γ t + λγ ( γ − t m (1 − t γ − ) (cid:21) N ( t ) . (12)The above equation has five parameters: α, α m , β, λ and γ , and those are dependent on theparticular virus that is spreading, the infection rate in a particular community as well as on11he externally imposed conditions such as the lockdown measures to which the populationis subjected to ( t m will be determined by varying it dynamically and requiring minimum χ ). With the available data these parameters can be constrained and then Eq. (12) can beutilized for future time evolution. To be noted that with α m = 1 , and γ = 2, Eq.(1b) turnsto Covid-19 model of Ref. [24], which can be thought of as a special case of Eq.(1b).Combining all the terms, the final compilation of a set of equations that we propose forthe time evolution of the cumulative infected population at different period is the following: dNdt = α ei N ( t ) + α i N ( t ) t (cid:20) − AN ( t ) (cid:21) or t ≤ t ≤ t m (13a) α i N ( t ) t (cid:20) − AN ( t ) (cid:21) ,αN ( t ) (cid:20) − N ( t ) βt α m +1 (cid:21) t m < t < t s i − (cid:20) λγt γ − α m t − λ ( γ − t m e − λ ( t − t m ) γ t + λγ ( γ − t m (1 − t γ − ) (cid:21) N ( t ) , (13b) − m s N ( t ) (cid:20) − FN ( t ) (cid:21) t s i ≤ t ≤ t s f (13c)These equations can be solved numerically for a given set of parameters, and by matchingthe boundary conditions at t m and t s i one can get a time evolution trajectory for the wholeinfection period. The parameters of Eq.(13) can be fixed through a χ -fitting of the numericaltrajectory against the available data. The transition periods can be evaluated dynamicallyby requiring the minimum χ . Here one can introduce an error σ i ’s on each days data whichwill then generate a set of trajectories allowed within that errors.The onset of saturation (plateauing), that is t s i , can be determined by finding the max-imum of N ( t ) at the end of the mitigation period. Taking the first derivative one arrivesat λγ ( t − t m ) γ − − n = 0 , (14)a positive-time solution of which provides t s i . One can also find the inflection point, that isthe onset of approach towards t s i , by taking the second derivative which yields λγt [1 + γ { λ ( t − t m ) γ − } ]( t − t m ) m − − n [1 + 2 λγt ( t − t m ) m − ] + n = 0 , (15)and then finding a solution of this equation.12e use both Eqs. (1) and (13) to generate the time evolution trajectories and theparameters are fixed using the available data. The full trajectories, covering the wholecontamination period, are then generated which also project the disease spread at the futuretimes. Results obtained for various affected regions are presented in the next section. III. RESULTS
In this section we present the results obtained through our proposed model. First, wewill check whether the proposed model is able to successfully track the available data on thecumulative number of infection at various Covid-19-affected regions. Then we will probe thepredictive ability of our model using a subset of data and reproducing the later trajectory.Finally, we will project the future time evolution of the number of infected population for afew countries where the infection is still rising rapidly.To validate our model we use the available data on Covid-19-infected population of thefollowing countries: England, Germany, Italy, France, Spain and also for New York City.These data are taken from Refs. [56–61]. We choose these countries and city as they havealready at the inside of the saturation period (after t s i ) in the ongoing wave of Covid-19disease. This will enable us to test our model from the early rise to the saturation period.After demonstrating the usefulness of the proposed model we proceed to show the predictiveability of our model by analyzing the subset of data for Italy and New York City. As will beevident later that this model can predict the future time evolution trajectory for a few weeksto months. Then we will show our results on the total number of infected population of USA,Brazil, Russia and India (separately on Mumbai and Delhi also), where the number of casesare still rapidly increasing. This will enable us to project the time evolution trajectoriesof these countries in the mitigation period, till they reach the beginning of their respectiveplateau positions (starting at t s i ). A. Validation of the proposed model
As mentioned earlier, to validate our proposed model, as in Eqs. (1) and (13), we firstanalyze the time evolution of the number of Covid-19 positive population for a number ofcountries for which the mitigation periods ended and the saturation periods have started. For13 given country, using the available data on the cumulative number of infected population,obtained from Refs. [1, 3, 56–69], we perform a non-linear χ -fitting of Eq. (1) (detailsof data analysis is given in the Appendix). The data are fitted over the entire time rangesimultaneously with three sub-equations in Eq. (1). We vary the extent of different time-periods dynamically and choose t m and t s i that minimize the total χ , which is a sum overindividual terms, i.e. , χ = χ i + χ m + χ s . Here the definition of χ is the usual one χ = (cid:80) ( f ( t i ) − N ( t i )) /σ N ( t i ) , where f ( t i ) is Eq. (1), N ( t i ) is the cumulative number of infectedpopulation on i-th day, and σ N ( t i ) is the error on that. We introduce this error ( σ N ( t i ) ) witheach day’s number with a maximum of 20% error for the first 14 days (incubation period,and most cases it is 10%) and less than a percent level error as the fit approaches to thesaturation period. An error of about 5-1% is imposed in between with gradual decrementwith the number of days. We have already mentioned it and elaborating it further herewith following arguments: at the onset of infection, the number of initial cases is not welldetermined as the number of tests performed could well be too low and there is a goodprobability for underestimation. We keep the maximum error for the first 14 days which istypically the incubation period for this respiratory disease. The testing procedure, capacityas well as reliability improves over time and hence the reported number for the Covid-positive cases becomes less more erroneous over time. The timing of a transition from oneperiod to other, as mentioned at the beginning of section II is chosen dynamically so thatthe total χ is minimum. To avoid a sharp transition between two periods (for example,early rise to mitigation, that is points just before and after t m ) we interpolate the resultsbetween the adjacent points obtained from two different fitting forms. This is justifiable aswe mentioned earlier that change form one period to others does not happen in a single dayand can happen with a smooth cross-over over several days. With this procedure we fit thedata for the above-mentioned countries uniformly. We first show our results for the firsttwo time-periods (initial and mitigation, till the onset of saturation) with fit to Eqs. (1a)and (1b). This will show the validity of our model at these two time-periods. Later we alsoshow our results of the entire time-periods (initial mitigation, and saturation). We showit separately as Eq.(1c) is the ideal representation of the saturation period with the sameenvironmental factors as in the mitigation period. However, in this period due to possibleease of various environmental factors and with a large unaffected population a second waveof infection may resurface which needs to be dealt separately. That we will discuss for the14ase of USA later.In Fig. 3 we show the cumulative Covid-positive cases for the above-mentioned countriesfor the first two time periods (initial to the onset of saturation, before t s i ), as determinedby our fit to Eqs. (1a) and (1b) dynamically with a minimum χ = χ i + χ m . The redpoints are the actual data points as obtained from Refs. [56–61], while the black lines arethe fitted trajectories with the time evolution form as in Eqs. (1a) and (1b). In TableI we show the mean values of the fitted parameters. As one can observe that the fittedtrajectories obtained through Eqs. (1a) and (1b) describe the actual trajectories of the timeevolution, quite well, starting from the onset of disease to the onset of saturation period as itapproaches the plateau region. It is interesting to see that the onset of saturation period forall countries happened when the power γ reaches a value closer to 1. In Fig. 4, we show theper day infection as a function of days for the above-mentioned countries. The red pointsare the actual data points as obtained from Refs. [56–61], while the black lines are resultsfrom Eq. (1). Countries Initial Parametersand City time ( t = 1) α ei α i α m λ γ t m t s i χ /dgf England Feb 24, ’20 0.2493 3.194 3.107 0.008 1.2633 46 93 1.1France Feb 25, ’20 0 3.674 2.340 0.0242 1.0141 41 80 3.2Germany Feb 25, ’20 0.2585 2.383 2.056 0.0121 1.1371 43 80 1.4Italy Feb 25, ’20 0 3.220 2.282 0.01457 1.1056 39 85 1.6Spain Feb 24, ’20 0 4.515 2.345 0.0238 1.0352 33 80 3.7NYC Mar 03, ’20 0.5810 1.619 1.770 0.0138 1.0853 39 76 0.8TABLE I: Mean values of the parameters of Eq. (1) as determined by the data [56–61] for severalcountries and New York City.
It is evident from Figs. 3 and 4 that the Eqs. (1a) and (1b) are able to track the timeevolution of the cumulative infected population very well, from the beginning to the onsetof saturation period. One would expect that Eq. (1b) terminates at a point where the rateof growth vanishes. However, that point is never reached due to multiple reasons: partial orfull un-lockdown when the rate of growth reduces, huge carrying capacity (when the totalnumber to reach herd immunity is very high). From the onset of saturation, thus Eq. (1b)15
England
France
Germany C on f i r m e d ca s e s ( N ( t )) Italy
Spain
Days (t)
NYC
Days (t)
FIG. 3: The time-evolution trajectories of infected population for various countries and for theNew York City are shown. The red filled circles are actual data from Refs. [56–61]. The blacksolid lines are the trajectories obtained by fitting the data with Eqs. (1a) and (1b). The data areplotted up to a time just before the onset of the saturation period as determined by the fit. Theparameters obtained from these fittings are tabulated in Table I. England
France
Germany d N / d t Italy
Spain
Days (t)
NYC
Days (t)
FIG. 4: The number of infected population per day are shown for various countries and New YorkCity . The red solid lines are the actual data [56–61] while the black solid lines are obtained fromEq. (1). is unsuitable to track the progression of the disease. At that point the third phase of timeevolution with Eq. (1c) becomes effective. We choose the onset of t s i dynamically with thecondition such that χ = χ i + χ m + χ s together provide the best acceptable χ /dgf with t s i as high as possible. With that method we fit the available full data set till July 31, 2020.17 EnglandEngland
France
Germany C on f i r m e d ca s e s ( N ( t )) Italy
Spain
Days (t)
NYC
Days (t)
FIG. 5: The time-evolution trajectories of infected population for various countries and for theNew York City are shown with full set of data (till July 31) including the saturation period. Thered filled circles are actual data from Refs. [56–61]. The black solid lines are the trajectoriesobtained through Eq. (1). The thick black lines also include 95% confidence intervals of thefitted parameters of the proposed model. The extended time trajectories with no data are not theprojection since a second wave of infection can start in this period (see text for details). England
France
Germany d N / d t Italy
Spain
Days (t)
NYC
Days (t)
FIG. 6: The number of infected population per day are shown for various countries and for NewYork City with full set of data (till July 31). The red solid lines are the actual data [56–61], whilethe black solid lines are obtained using Eq. (1), as in Fig. 5.
Results for the total number of Covid-19 positive cases, ( N ( t )), and the per day increments( dN ( t ) /dt ), for the full data set along with fitted trajectories are shown in Figs. 5 and6, respectively, for various countries and New York City. It is quite satisfying to see thatthe full data set for various countries and cities as well can be tracked very well through19he model proposed in Eq. (1). It is to be mentioned here that in the saturation periodour main purpose is to show the validity of the proposed model. We do not project thetime evolution trajectory in the future in this period as is shown in the figure. Though theextended trajectories (beyond data point) are obtained with 95% confidence intervals of thefitted parameters, a projection in this period may not hold as a second wave of infectioncan start at any point of time in this period. That is probably what is currently happening,albeit with less extent, in all the regions mentioned in this plot, except for Italy and NewYork city. Most notable one is on Spain’s data where from the middle of July, the numberfor per day infection has started to increase again beyond what we observe throughout insidea large part of saturation period ( t = 81 − χ with a given errorbar. Here, againwe take the usual definition of χ = (cid:80) ( f ( t i ) − N ( t i )) /σ N ( t i ) , where f ( t i )’s are obtainedfrom the solution of Eq. (13) after matching the boundary conditions, whereas N ( t i )’s arethe actual data with imposed errors σ i , which accounts for the uncertainty in the reported20 England C on f i r m e d ca s e s ( N ( t )) all days:5 days avg:Eq.(1), all days:Eq.(1), 5 days avg: 0 1000 2000 3000 4000 5000 6000 0 20 40 60 80 100 120 140 160 England d N / d t Days (t) all days:5 days avg:Eq.(1), all days:Eq.(1), 5 days avg:
FIG. 7: The time evolution data for the number of infected population is shown for Englandcorresponding to data for each day and also for average over 5 days. The results obtained usingEq. (1) are shown by magenta and blue lines respectively. numbers as mentioned earlier. To be noted that solutions of these equations with a givenerrorbar will result in a band of trajectories rather than just one. However, that is morerealistic since a different set of parameters, with minimum χ , generates a band of trajec-tories which can spread over a few days which is also what we observe in reality. At theboundary, we match the solution without any discontinuity that is solutions are continuouslyprogressed. For example, the initial value of N ( t ) at t = t m + 1 with (13b) is generated fromthe solution of (13a) at t = t m . In this way, we match the boundary conditions of solutionsbetween different regions. A trajectory generated by these equations is accepted or rejectedbased on the minimum χ within the errorbars. The trajectories obtained through thesedifferential equations for Italy and New York City are shown in Fig 8. Top two plots showthe time evolution trajectory for the cumulative number of infection while the bottom twoplots show its rate of change (infected number per day). Here again, it is satisfying to seethat the differential equations (13a) and (13b) can reproduce the time evolution of diseasespread from the initial days to the onset of saturation. Similarly, the solutions of Eq. (13c)can also track the disease spread in the saturation period assuming the same environmentalcondition prevails throughout this period. Though we show results for only two regions, datafor others regions can also be tracked with equal success through the proposed differentialequations (Eq. (13)). 21 NYC C on f i r m e d ca s e s ( N ( t )) Days (t)
Italy C on f i r m e d ca s e s ( N ( t )) NYC d N / d t Days (t)
Eq (13)NYC dataEq (1)
Italy d N / d t Days (t)
Eq (13)Italy dataEq (1)
FIG. 8: The time evolution trajectories as obtained from the proposed differential equation (Eq.(13)) for Covid-19 growth are plotted along with the data for New York City and Italy. The redsolid lines are actual data from Refs. [59, 61] while the blue lines are obtained from the solutionsof the differential equations (Eq. (13)). The upper two figures are for the total number of infectionwhile the bottom two figures represent corresponding per day infection.
B. Predictive ability of the proposed model
The success of a mathematical model not only depends on the validation but cruciallyalso on its predictive power. After successfully validating the model through reproducingthe time evolution of the number of infections, hence, we proceed to find the predictiveability of our proposed model. Since the model is non-linear, a small change in parameterspace may result in a different time evolution trajectory. However, if there is a parameterspace that is unique to disease spread and if that can be constrained with a subset of datathen the model can be used for future evolution. With this in mind, we find that it is notpossible to constrain the parameter space until it reaches the beginning of the mitigation22eriod ( t > t m ). That is apparent as the parameters of a region cannot be obtained fromother regions since the dynamics of disease spread in two regions are different. We usea subset of data till the 55th days (for NYC) and 60th days (for Italy) and use those toconstrain the respective parameter spaces. We then employ those to obtain the respectivefuture trajectories till the onset of saturation period for the total number of infection. To bementioned that the onset of saturation period is determined dynamically by minimizing thetotal χ . Results are shown in Fig. 9, where the top two plots are for the cumulative numberof infection and the bottom plots are for per day infection. The solid red lines represent theactual data [59, 61] while the blue solid lines are obtained using differential equations (Eq.(13)). To compare the effectiveness of projection from the model results, in the bottom twofigures we also plot the actual data for the whole range along with its fitted results throughEq. (1). As can be observed from this figure that the proposed differential equations (Eq.(13)) can project the future time evolution trajectories of the number of infection with acorrect trend for next few weeks quite reliably. For New York City the projection is for21 days t s i (= 76 −
55) and the trend of the projected trajectory goes even beyond that.For Italy this projection is for 25 days t s i (= 85 − C. Projection for a few other countries
Having validated the model and then showing its predictive ability, we proceed to studythe data on other countries, Russia, Brazil, India and the USA, where the number of Covid-positive cases are still rising substantially. For India, we also choose two of its biggestcities, Mumbai and Delhi. We first constrain the parameters by fitting the available data asdetailed above, both for Eq. (1) and the differential equations (Eq (13)). We then proceedto predict the future time-evolution trajectories up to the onset of saturation.23
NYC C on f i r m e d ca s e s ( N ( t )) Days (t)
Italy C on f i r m e d ca s e s ( N ( t )) Days (t)
NYC d N / d t Days (t)
Eq (13)NYC dataEq (1)
Italy d N / d t Days (t)
Eq (13)Italy dataEq (1)
FIG. 9: The time evolution trajectories as obtained from the proposed differential equation(Eq.(13)). A subset of the data is utilized to constrain the parameter set of the model which arethen used to predict future time evolution. The data (cumulative and differential) for NY Cityand Italy are shown by the red solid lines while the blue solid lines are obtained using data up totime 55th days and 60th days for NY City and Italy, respectively.
1. Russia
In Fig. 10 we plot the cumulative number of Covid-19 positive cases for Russia [62] withthe effective starting date ( t ) as March 03, 2020. The data of the time-trajectory shows theexpected exponential rise followed by a power-rise growth as in Eq. (1a) and (1b). We fitthe data with the combined equations and find α ei = 0 . α i = 4 . α i that shown in Table 1 for other countriesin Europe. From the fitted results we find the transition from the rising to the mitigationperiod has happened around time t m = 67, which corresponds to around May 10, 2020.It is interesting to note that the mitigation period of Russia is quite longer compared to24ther European countries, perhaps due to larger population. Using the available data up toAugust 10 we extract the parameters and use those to predict the time evolution trajectorytill the time t = 200 − t s i )with less than 300 per day infections. We also use the proposed differential equation for thetime evolution (Eq (13)) and constrain the parameter set using data till August 10. Thecorresponding time evolution trajectories (both cumulative and differential) are shown inFig. 10. The blue bands are obtained using the constrained parameter set, while the redsolid lines are actual data. The projected results, using both Eq. (1) as well as Eq. (13) showthat, with the same prevailing environmental condition as it is now, the onset of saturation( t s i ) will start most probably in between t = 190 −
0 50 100 150 200
Russia C on f i r m e d ca s e s ( N ( t )) Days (t)
Data:Eq (13):Eq (1):
Russia d N / d t Days (t)
Eq (13):Eq (1):Data:
FIG. 10: The time evolution trajectory of Covid-19 infected population for Russia is shown. Thered filled circles are actual data from Ref. [62]. Data up to August 10 are used to obtain parametersfor Eqs. (1) and (13), and projections for future evolution are shown by black and blue lines withallowed values in the parameter space.
2. Brazil
The second country whose data we study, and where Covid-19 infection is increasingrapidly, is Brazil. It has a large population and high population density in a number of big25ities. It is natural to expect that without the strict regulations on human to human contactand proper mechanisms to tackle a pandemic, a contagious disease like Covid-19 will spreadrapidly to a high number, and both the rising as well as mitigation periods will be prolonged.In fact, in a short span of time the total number of infection has reached to a large numberand it is now the second most affected country with total infections more than 3 million andmore than one hundred thousand deaths (Ref. [63] till August 10). The infection is stillvery much increasing and one would expect that a large population will further be infectedin the foreseeable future. It is thus quite important to project the probable time evolutionof the total number of infection.
0 50 100 150 200 250 300
Brazil C on f i r m e d ca s e s ( N ( t )) Days (t)
Data:Eq (13):Eq (1):
Brazil d N / d t Days (t)
Data:Eq (13):Eq (1):
FIG. 11: The total number of Covid-19 virus (SARS-CoV-2)-infected population (left) and theper day infection (right) are shown as a function of time (days) for Brazil. The red filled circles areactual data from Ref. [63]. Data up to August 10 are used to obtain parameters for Eqs. (1) and(13), and projections for future evolution are shown by black and blue lines with allowed values inthe parameter space.
We analyze Brazil’s data on the total number of infection till August 10. As in othercountries, we find an exponential ( α ei = 0 .
3) followed by a power-law-rise ( α i = 3 . t of infection as March05, 2020. Using the available data till August 10 [63], we first constrain the parametersof Eqs. (1) and Eq. (13). With the constrained set of parameters, the projection forfuture trajectories of the total number of infection, till the possible time for the onset of26aturation, are shown in Fig. 11. The red solid line is the available data, black solid linesare obtained through Eq. (1) and the blue band represents the projection through thedifferential equations in Eq (13). Our results suggest that the current wave of infection,with the prevailing measures against disease spread, will progress for another 3-4 monthsand then it will reach to the beginning of the saturation period by the end of Novemberwith less than 2500 infections per day. The cumulative number of infection by then willreach to 5.5-6.5 million. However, with more stringent measures to prevent disease spreador if an effective vaccine becomes available soon, the mitigation period could be shortened.On the other hand, if the prevailing conditions of preventive measures become softer andno vaccine is available, then a second wave can start from this mitigation period with morenumber of infections as is happening now in USA. There is still a large uncertainty in thisprojection. As mentioned earlier that this projection can be progressively improved withinclusion of more data as the disease progress further. Update results will be posted in the url mentioned later .
3. India
The analysis of Covid-19 data on India provides an ideal platform as well as a challenge toformulate a model for the time evolution trajectory of the infected population consideringthe shear numbers for both the population density (464 per square km [70, 71]) and thetotal population which is second highest in the world. Moreover, constant migration ofpopulation between cities and countrysides, percentage of young population, inadequacy intesting facility and several other effective factors (economical, social, religious as well aspolitical) can make the time evolution of the infected population in India very different thanthat of many other countries.In India, the first Covid-19 positive case was reported on January 30, 2020. Till March03, the number of infected patients was limited to 6, but after that it started to rise asmany infected travelers returned to India from various parts of the world. To reduce thewidespread infection of this virus, the Government of India had imposed a lockdown startingon March 24, which was subsequently extended till May 30. After then un-lockdown hasstarted in phases at different sectors to balance the economic slowdown. Many of its statesand cities have demarcated the containment zones and are periodically imposing further27ockdown measures at those places (some time a few days in a week to a more than a weekat a stretch) to contain the infection spread. As of August 10, the total number of infectedpopulation is more than 2.2 million with a fatality rate of about 1.98% [64–66].In Fig. 12, we plot the cumulative number of confirmed cases [64, 65] with the effectivestarting date ( t ) as March 02, 2020. Again we find an exponential increment followed bya power-law-rise growth as in Eq. (1a). We fit the data with the combined equations. It isinteresting to note that α ei and α i , till the middle of June, was on the lower side comparedto that of other countries with a smaller population, which signifies the effect of preventivemeasures at the initial days. However, α i has started to increase after that (higher percentageof infection is also correlated to the number of tests which has substantially increased duringthis period).
20 40 60 80 100 120 140 160 180 200
India C on f i r m e d ca s e s ( N ( t )) Days (t)
Data:Eq. (1):
India d N / d t Days (t)
Data:Eq. (1):
FIG. 12: The time evolution trajectory of the confirmed Covid-19 infected population for Indiais shown. The red filled circles are actual data from Ref. [64]. Data up to July 31 are employedto obtain parameters for Eq. (1) and projections for future evolution are shown by the black line.The two bands show 68% and 95% confidence intervals.
As is evident from the recent data that the disease spread is surging ahead and there isno signal for slowing down of infection even after passing more than 5 months by when manycountries are either in mitigation or saturation periods in the first wave of infection. This isprecisely due to a large number as well as density of population where the strict lockdownmeasures cannot be maintained continuously for indefinite periods considering economical,social and political factors. Considering these external factors, it is expected that the rising,as well as mitigation periods of a contagious disease, will considerably be longer (and most28robably be the longest) compared to those of all other countries. Given that there is noavailability of an effective vaccine in the immediate future and it is not feasible to impose aprolonged lockdown, other possible measures (strict social distancing, contact tracing, mask-wearing, informing all levels of population about the disease transmission etc.) against thiscontagious disease, which could well be airborne [1] must be strictly followed to avoid anavalanche of infected population in next few months.On Indian data, the main observation is that the mitigation period has not started yet.Hence, at this moment, through our model it is not possible to project the time evolutiontrajectory for a longer period in future. However, considering the current rate of infection weproject the possible scenario that may happen to the number of total infection in next threeweeks. Fits to the model suggest that, with the current trend, the cumulative number ofinfection will cross 3.5 million at the end of August (Fig. 12). On a positive note, only veryrecently the increment coefficient, α m (of Eq. (1b)), is found to be reducing, albeit slowly,with its current value at 4 .
58. It will be interesting monitor if α m continues to reduce furtherand the infection enters into the mitigation period. We will continuously monitor the diseasespread and will project the time evolution later more accurately with the availability of moredata.Along with the whole Indian population, we also analyze data for its two biggest cities,Mumbai and Delhi. Below we elaborate our analysis for these two cities and subsequentlyproject the most probable trajectory of infection for the next few months.3.1 Mumbai
Mumbai is the largest city in India and also is one of the most densely populated cities inthe world ( > t ) as March 14, 2020 [67]. This data also shows an exponentialincrease followed by a power-rise growth in the initial days as in Eq. (1a). We fit thedata with the combined form and find α ei = 0 .
176 and α i = 3 .
436 which are again smallercompared to many other places. Perhaps this is due to strict lockdown measures at theinitial days. The transition from the rising to the mitigation period is found to be at time t m = 81, which corresponds to around June 03, 2020. Due to its huge carrying populationdensity as well as the total population, it is expected that the mitigation period will bemuch longer. With existing data we track the time evolution with Eq. (1) and the fittedresult is shown with the black solid line in Fig. 13. We also use Eq. (13) to find the timeevolution trajectory and the solutions with the restricted parameter space is shown by theblue band. Results from this work, at this time, project that with the existing measuresof restriction against the infection spread, we expect to see the onset of saturation period,with about 250 infection per day, around time t s i = 195 −
210 which corresponds to theend September to early October, with a cumulative number of infection in between 145-160 thousand. However, this will very much depend on the continuation of the prevailingpreventive measures against the disease spread, given the large unaffected population withlarge density in Mumbai.
Mumbai C on f i r m e d ca s e s ( N ( t )) Days (t)
Data:Eq (13):Eq (1):
Mumbai d N / d t Days (t)
FIG. 13: The time evolution trajectories of the confirmed Covid-19 infected population (cumulativeand differential) for Mumbai are shown. The red filled circles are actual data from Ref. [67]. Dataup to July 31 are used to extract parameters for Eqs. (1) and (13), and projections for futuretrajectories are shown by black and blue lines with allowed values in the parameter space. Mumbai C on f i r m e d ca s e s ( N ( t )) Days (t) all days:5 dys avg:Eq.(1), all days:Eq.(1), 5 days avg:
Mumbai d N / d t Days (t) all days:5 days avg:Eq.(1), all days:Eq.(1), 5 days avg:
FIG. 14: The time evolution trajectories of the confirmed Covid-19 infected population (cumulativeand differential) for Mumbai are shown both for all days data (red point/line) and 5 days averagedata (black point/line). The results obtained using Eq. (1) are shown by magenta and blue linesrespectively.
Similar to our study on England’s data we also perform a study on 5 days average dataof Mumbai. The results are shown in Fig. 14. As one can see that there is effectively nodifference between two results (magenta and blue lines), except a very small difference atthe saturation period. This shows that our method for obtaining parameters for Eq. (1) isrobust.3.2
Delhi
Delhi is another highly populated city in the world. Like Mumbai, Delhi has also anumber of densely populated areas and everyday migration happens to and from Delhi. Itis thus also interesting to analyze the Covid-19 data on Delhi and compared it with that ofMumbai. Below we elaborate our analysis on Delhi.The first reported case on Covid-19 infection in the city of Delhi was on March 2. Subse-quently the infected number has increased, and as of August 10, the total infected numberas reported is more than 146 thousand with a mortality rate of 2.82% [68], which is con-siderably lower than the same rate of Mumbai. As in other parts of India, Delhi was alsounder lockdown from March 24 and gradual un-lockdown has initiated from the beginningof June. However, unlike Mumbai, there was a sudden jump in the number of infection wasobserved around fourth week of June, perhaps due to ease of lockdown and not maintainingother preventive measures. Since then further measures against the disease spread have31een taken and the number of per day infected population has slowed down. In Fig. 15,we plot the cumulative number of infected population [68] with the effective starting date( t ) as March 2, 2020. This data also shows an exponential followed by a power-law-riseform as in Eq. (1). We fit the data with the combined form and find α ei = 0 .
186 and α i = 4 .
82 which are larger than the corresponding exponents for Mumbai. The transitionfrom the rising to the mitigation period is found to be at time t m = 103, which again islarger than Mumbai. This transition time corresponds to June 25, 2020. Due to its hugecarrying population density as well as the total number of population, we again expect thatthe mitigation period will be longer. With the existing data, we track the time evolutionwith Eq. (1) and the result is shown with the black solid line in Fig. 15. We also use Eq(13) and solutions with the allowed parameters are shown by the blue band. Our projectionon further Covid-19 infection spread for Delhi is that with the sustained current measurestaken against disease spread, we expect to see the onset of saturation time (about 250 infec-tions per day) around t s i = 180 −
190 which corresponds to around the end of September.However, as mentioned earlier this will very much depend on whether existing preventivemeasures can be maintained over the period. Our model suggests that the infection can becontained substantially with the current measures against disease spread, and by the endof September to the beginning of October it can be reduced to 250 per day with a cumu-lative infected population within 160-180 thousand. However, adequate precaution mustbe taken even after that considering the large unaffected population and regular migration,particularly as the city is trying to come back to its regular life.
4. USA
At the end of this work, we analyze the data on Covid-19 infected population of USA.It is also a populous country with a large number of big cities with large population. Wehave already analyzed the data on New York City in the subsections A and B above. Tobe noted that the infected population of New York City has already reached the saturationperiod (stage 3 of Eq. (1), see Figs. 5 and 6). However, unlike New York City, currentlyin many places in USA, the number of infected population are increasing quite rapidly. Infact, the data on USA is quite interesting and has a unique feature which shows a big jumpin the number of infected population from a situation where there was a clear pattern of32
Delhi C on f i r m e d ca s e s ( N ( t )) Days (t)
Data:Eq (1):
Delhi d N / d t Days (t)
Data:Eq (1):
FIG. 15: Time evolution trajectories of the Covid-19 infected population (cumulative and differ-ential) for Delhi are shown. The red filled circles are actual data from Ref. [68]. Data up to July26 are used to obtain parameters for Eqs. (1) and (13), and projections for future evolution areshown by black and blue lines with allowed values in the parameter space. The blue band shows95% confidence intervals. reduction in the number of infected cases. Unlike many European countries and New YorkCity, the trajectory of the number of infected population in USA does not move towardsthe saturation period. Rather it looks like a new wave of infection has started with anotherinitial power-law-rise increment and subsequent mitigation. For a contagious disease thismust be related to the change in measures against the disease spread by a large population.In Fig. 16, we plot the cumulative number on infected population [1, 69] with the effectivestarting date ( t ) as February 27, 2020. Similar to most of the previous countries, thisdata also shows an exponential increment followed by a power-law-rise, as in Eq. (1a). Asimultaneous fit with both forms yields α ei = 0 . α i = 2 . α ei = 0 . α i = 1 . t at Feb 27, 2020) showsa unique feature: instead of slowing down with the expected trajectory, as was observed inmost other countries, the number of infection surges once more with a power-law-rise form.33
0 50 100 150 200 250
USA C on f i r m e d ca s e s ( N ( t )) Days (t)
USA d N / d t Days (t)
FIG. 16: The time evolution trajectories of Covid-19 infected population (cumulative and perday) for USA are shown. The red filled circles are actual data from Ref. [69]. Data up to July 31,including both
W a and
W b (see text for details), are utilized to obtain parameters for Eq. (1).Results from our model are shown by the black line. The trajectory without red points shows theprojection for the total number of infected population in next few months. The blue line representsthe result for only
W a showing the possible trajectory that could have happened had the secondincrement in
W b not taken place. The two bands represent 68% and 95% confidence intervals (seeAppendix for details).
Indeed data after that can be fitted easily again with a power-law-rise form (Eq. (1a)).Because of the above-mentioned unique feature of the trajectory we use a different analysisstrategy for this data. First, we try to find out a maximum time till which the total numberof infection can be analyzed like any other affected regions incorporating an early incrementof exponential-cum-power-law-rise followed by a mitigation period. We call that part of dataas
W a . The maximum of
W a is chosen dynamically with the usual condition of minimum χ ( W a )(= χ i ( W a ) + χ m ( W a )). Then we introduce another exponential-cum-power-law-risefollowed by a mitigation period, considering as if there is a second wave. We call this partof data as
W b . We then dynamically choose the time of transition from
W a to W b so thatthe total χ = χ ( W a ) + χ ( W b ) becomes minimum within the acceptable limit. We findthe minimum χ fit leads to t m ( W a ) = 45, the transition time from
W a to W b at aroundtime t tr = 103, and t m ( W b ) = 129 − α i in W b (5.594) is much bigger than its corresponding value in
W a (2.744). In fact, it isthe biggest among all other power-law-rise exponents for any affected regions that we study.This is a clear signal that the infection had spread very rapidly in the early period of
W b ,and unless adequate measures are imposed against the disease spread a substantial increasein the number of infection is expected in foreseeable future. However, on a positive note,it is worthwhile to mention that our model also allows a possibility that another mitigationperiod has started, albeit slow, at around t = 129 − i.e. , around the second week ofJuly. With this slow rate of mitigation, and in no change in existing conditions, this diseasecan rise another 2.5-3 months infecting about 6.5-7 million people (shown by the black linein Fig. 16), before reaching to the saturation period with less than about 3000 infectionsper day by the end of October to the beginning of November. Thereafter the onset ofsaturation period can possibly be achieved by maintaining the same restrictions against theinfection spread. We project this as the ideal case scenario with continuation of the existingrestrictions and current trend of reduction. Although this is an ideal situation and in realitythere will be deviation in restrictions and so in this projection, still this can be taken asa possible scenario. However, if the current value of the percentage of per day incrementsustains and does not reduce further in the next 2 weeks, this situation can change. We willdiscuss next about such a possible situation with a set of speculated data.Before that, here one point is worthwhile to mention that had the infection trajectoryfollow the similar pattern as that of most other countries ( i.e. , without W b , as shown by theblue line in Fig. 16), around 150 days ( i.e. , by now) the onset of saturation period would hadstarted with a much reduced number of infections (about 3000 per day). This is stronglyindicative that by maintaining the measures against the infection spread for longer time, asit was in the early days of
W a , perhaps the disease spread could have been substantiallycontained by now and perhaps the onset of saturation period could have happened as well!4.1
A speculative but possible scenario:
At the end we would discuss about a speculative scenario on the disease spread in nextfew months based on a speculative set of data for next 2 weeks. We make following twoassumptions: i) in next two weeks the number of per day infection does not increase morethan 1.25% of previous day’s number (current rate is about 1.1-1.2%) and ii) it also does35ot decrease less than 0.9%. With these assumptions we generate a set of speculative datafor next 2 weeks. We then use this data, and use our model, with t m ( W b ) = 129, to get thefuture trajectory. In Fig. 17 we show the corresponding results. The speculative data isshown with the blue line and the projection is shown by the green line. The original data tillAugust 10 is shown with the red line, and projection with the same t m is represented by theblack line. This speculative scenario points that if the percentage of infection, as mentionedabove, does not decrease further in next 2 weeks and sustains in between 0.9-1.25%, thenwithin this model there is high possibility (95% confidence intervals) that the number ofinfection by the end of October will be about 7-7.5 million with 10000-15000 infections perday.
0 50 100 150 200 250
USA C on f i r m e d ca s e s ( N ( t )) Days (t)
USA d N / d t Days (t)
FIG. 17: A speculative scenario considering next 2 weeks data as shown by blue lines. The redfilled circles are actual data from Ref. [69]. Results from our model are shown by the black line.The green line shows a possible scenario as projected by our model considering the speculateddata (blue line) along with the actual data (red points). The two bands represent 68% and 95%confidence intervals.
Summary of all projections described above is given in Table II.
IV. SUMMARY AND CONCLUSIONS
In this work, using a data-driven statistical model, we present an analysis of the time-evolution of the Covid-19 infected human population for a number of countries and cities.By analyzing the time-dependent data of the infected population, we find that there is36 ountries Projectionand Cities t s i N dNdt Russia End Sept − early Oct 0 . − . × < − mid Dec 5 . − . × < ∼ . × (with current rate) > − Early Oct 145 − × < − Early Oct 160 − × < ∼ . − × < < − . × ∼ − a common pattern of infection over a long time-period irrespective of different prevailingconditions at the affected regions. The total infection time can broadly be divided intothree periods: i ) initial increment with an exponential rate followed by a power-law rise orsimply a power-law rise from the beginning, ii ) a mitigation period and iii ) a saturationperiod. Of course, the transitions from one period to the next one is a smooth cross-overwithout any discontinuity. We propose a mathematical model for the time evolution of thenumber of infected population and show that it can very well track the number of infectedpopulation at various affected regions, from the initial days to saturation period. Basedon the proposed mathematical trajectory we then formulate a set of differential equationswhich can equally well describe the same time evolution. The parameters of the model(for both the mathematical trajectory (Eq. (1)) and differential equations (Eq.(13)) areextracted by dynamically performing a simultaneous χ -minimization against the availabledata. Our proposed model is then employed to project the probable number of infectedpopulation in future till the saturation period is reached. To demonstrate the predictiveability of the model we use a subset of the data and show that using the proposed model37e can project the future time evolution of the number of total infected population from aweek to a month and even more. Projection from this model can be progressively improvedby including more and more data. However, it is based on the assumption that the existingmeasures against the disease spread remain to be the same throughout this period andthe affected region is isolated (i.e., new infections are not coming from outside). Aftervalidation and showing the predictive ability of our model, we analyze the similar data ona number of other countries where Covid-19 infection is currently increasing rapidly. Weobserve that for Russia the per day infection has already peaked around May 10, and itis heading towards the onset of saturation period. With the continuation of the existingmeasures against the disease spread, we expect to see the onset of saturation period bythe end of September to the beginning of October with less than 500 infections per dayand the cumulative number of infected population around 0 . − . . − . . We also analyze data on two of its biggest cities: Mumbai and Delhi. For Mumbai,our finding is that it has entered into the mitigation period. If the current conditions againstthe disease spread can be continued throughout, by the beginning of October we expect to The updated figures on projection will be available at the following url: https://theory.tifr.res.in/~nilmani/covid19 . . − . We summarize these projections in Table II.In the proposed model the parameters are assumed to be mean-field-type effective param-eters with no time and other dependences. However, in reality, there is a number of externalparameters, which could be environmental, medical, economical, social as well as political.It is not clear how the model parameters are correlated with those external factors, and itwould be worthwhile to study their inter-relations. Such a study is necessary to understandthe significance of the model parameters. It may be possible to constrain the parametersmore stringently using some other data, for example, the number of active cases and thenumber of fatalities. We will pursue such a study in future. The predictive ability of themodel can be improved further if it is possible to find a robust method to quantify the errorsin the reported data. 39 . ACKNOWLEDGMENTS We thank TIFR for support. GS also acknowledges WOS-A grant from the Department ofScience and Technology (SR/WOS-A/PM-9/2017), India. NM would like to thank GiorgioSonnino for discussions, Abhishek Dhar for comments, and Girish Kulkarni for discussionson error analysis. [1] World Health Organization: https://covid19.who.int/ [2] Johns Hopkins University coronavirus resource center: https://coronavirus.jhu.edu/map.html [3] Wikipedia resource for Covid-19: https://en.wikipedia.org/wiki/COVID-19_pandemic [4] D. Adam,
Special report: the simulations driving the world’s response to COVID-19 , Nature,2020; 580(7803):316-318, doi:10.1038/d41586-020-01003-6 .[5] J. T. Wu, K. Leung, G. M. Leung,
Nowcasting and forecasting the potential domestic andinternational spread of the 2019-ncov outbreak originating in wuhan, china: a modelling study ,The Lancet , 689 (2020). https://doi.org/10.1016/S0140-6736(20)30260-9 .[6] Klepac, P. et al.,
Contacts in context: large-scale setting-specific socialmixing matricesfrom the BBC Pandemic project , medRxiv (2020) https://doi.org/10.1101/2020.02.16.20023754(2020) .[7] B. Tang, et al. , Estimation of the transmission risk of the 2019-ncov and its implication forpublic health interventions , Journal of Clinical Medicine , 462 (2020). https://doi.org/10.3390/jcm9020462 .[8] B. Tang, et al. , An updated estimation of the risk of transmission of the novel coronavirus(2019-ncov) , Infectious disease modelling , 248 (2020). https://doi.org/10.1016/j.idm.2020.02.001 .[9] Q. Li, et al. , Early transmission dynamics in wuhan, china, of novel coronavirus–infected pneumonia , New England Journal of Medicine, (2020). https://doi.org/10.1056/NEJMoa2001316 .[10] H. Wang, et al. , Phase-adjusted estimation of the number of coronavirus disease 2019 cases inwuhan, china , Cell Discovery , 1 (2020). https://doi.org/10.1038/s41421-020-0148-0 .
11] A. J. Kucharski, et al. , Early dynamics of transmission and control of covid-19: a mathe-matical modelling study , The lancet infectious diseases (2020). https://doi.org/10.1016/S1473-3099(20)30144-4 .[12] S. A. Lauer, et al. , The incubation period of coronavirus disease 2019 (covid-19) from publiclyreported confirmed cases: estimation and application , Annals of internal medicine (2020). https://doi.org/10.7326/M20-0504 .[13] J. A. Backer, D. Klinkenberg, J. Wallinga,
Incubation period of 2019 novel coronavirus (2019-ncov) infections among travellers from wuhan, china, 20–28 january 2020 , Eurosurveillance (2020). https://doi.org/10.2807/1560-7917.ES.2020.25.5.2000062 .[14] N. M. Linton, et al. , Incubation period and other epidemiological characteristics of 2019 novelcoronavirus infections with right truncation: a statistical analysis of publicly available casedata , Journal of clinical medicine , 538 (2020). https://doi.org/10.3390/jcm9020538 .[15] M. Chinazzi, et al. , The effect of travel restrictions on the spread of the 2019 novel coronavirus(covid-19) outbreak , Science (2020). https://doi.org/10.1126/science.aba9757 .[16] M. U. Kraemer, et al. , The effect of human mobility and control measures on the covid-19epidemic in china,
Science (2020). https://doi.org/DOI:10.1126/science.abb4218 .[17] K. Prem, et al. , The effect of control strategies to reduce social mixing on outcomes of thecovid-19 epidemic in wuhan, china: a modelling study , The Lancet Public Health (2020). https://doi.org/10.1016/S2468-2667(20)30073-6 .[18] R. Li, et al. , Substantial undocumented infection facilitates the rapid dissemination of novelcoronavirus (sars-cov2) , Science (2020). https://doi.org/10.1126/science.abb3221 .[19] Z. Yang, et al. , Modified seir and ai prediction of the epidemics trend of covid-19 in chinaunder public health interventions , Journal of Thoracic Disease , 165 (2020). https://doi.org/10.21037/jtd.2020.02.64 .[20] P. Boldog, et al. , Risk assessment of novel coronavirus covid-19 outbreaks outside china , Jour-nal of clinical medicine , 571 (2020). https://doi.org/10.3390/jcm9020571 .[21] R. M. Anderson, H. Heesterbeek, D. Klinkenberg, T. D. Hollingsworth, How will country-based mitigation measures influence the course of the covid-19 epidemic? , The Lancet ,931 (2020). https://doi.org/10.1016/S0140-6736(20)30567-5 .[22] S. M. Kissler, C. Tedijanto, E. Goldstein, Y. H. Grad, M. Lipsitch,
Projecting the transmissiondynamics of sars-cov-2 through the postpandemic period , Science (2020). https://doi.org/ .[23] R. Singh and R. Adhikari, Age-structured impact of social distancing on the COVID-19 epi-demicin India , arXiv:2003.12055, https://arxiv.org/abs/2003.12055 .[24] G. Sonnino and P. Nardone,
Dynamics of the COVID-19 – Comparison between the TheoreticalPredictions and Real Data , arXiv:2003.13540, https://arxiv.org/abs/2003.13540 .[25] A. Agosto, and P. Giudici,
A Poisson Autoregressive Model to Understand COVID-19 Conta-gion Dynamics , preprint (2020), http://dx.doi.org/10.2139/ssrn.3551626 [26] L. Li et al,
Propagation analysis and prediction of the COVID-19 , Infectious Disease Modelling,Volume 5, 282 (2020), https://doi.org/10.1016/j.idm.2020.03.002 doi:10.20944/preprints202005.0319.v1 .[28] I. C.-. health service utilization forecasting team and C. J. Murray, medRxiv (2020).[29] R. Marsland III and P. Mehta,
Data-driven modeling reveals a universal dynamic under-lying the COVID-19 pandemic under social distancing , arXiv:2004.10666, https://arxiv.org/abs/2004.10666 .[30] Z. Fodor, S. Katz and T. Kovacs,
Why integral equations should be used instead ofdifferentialequations to describe the dynamics of epidemics , arXiv:2004.07208, https://arxiv.org/pdf/2004.07208.pdf .[31] A. Das, A. Dhar, S. Goyal and A. Kundu,
Covid-19: analysis of a modified SEIR model, acomparison of different intervention strategies and projections for India , arXiv:2005.11511, https://arxiv.org/abs/2005.11511 .[32] H. Badr, et al.
Association between mobility patterns and COVID-19 transmission in theUSA: a mathematical modelling study , The Lancet infectious diseases (July 1, 2020), https://doi.org/10.1016/S1473-3099(20)30553-3 .[33] I. Holmdahl, S.M., and Caroline Buckee, D.Phil,
Wrong but Useful - What Covid-19 Epidemi-ologic Models Can and Cannot Tell Us , The New England Journal of Medicine, May 15, 2020,
DOI:10.1056/NEJMp2016822 .[34] D. Adam,
A guide to R - the pandemic’s misunderstood metric , Nature, 2020Jul;583(7816):346-348, doi:10.1038/d41586-020-02009-w .[35] R.M. Anderson and R.M. May,
Infectious diseases of humans , Oxford University Press, Oxford(1991)
36] O. Diekmann and J. Heesterbeek,
Mathematical epidemiology of infectious diseases: Modelbuilding, analysis and interpretation , Wiley (2000)[37] W.O. Kermack and A.G. McKendrick, The Journal of Hygiene (London), 37 (1937), pp. 172-187[38] Murray J.D.,
Mathematical Biology , Springer-Verlag Berlin, Heidelberg GmbH (1993).[39] M. Nowak and R. M. May,
Virus Dynamics, Mathematical Principles of Immunology andVirology , Oxford University Press (2000, ISBN: 9780198504177)[40] Editorial, Infectious Disease Modelling, Volume 1, Issue 1, 2016, Pages 1-2,[41] G. Chowell and C. Viboud, Infectious Disease Modelling, Volume 1, Issue 1, 2016, Pages 71-78,[42] C. Viboud, L. Simonsen and G. Chowell, Epidemics, 15 (2016), pp. 27-37[43] B. Pell, Y. Kuang, C. Viboud, and G. Chowell, Epidemics 22, 62 (2018).[44] B. Tang, N. L. Bragazzi, Q. Li, S. Tang, Y. Xiao, and J. Wu, Infectious disease modelling 5,248 (2020).[45] D. A. Villela, Infectious Disease Modelling (2020).[46] Kermack W. O., McKendrick, A. G. (1927),
A Contribution to the Mathematical Theory ofEpidemics , Proceedings of the Royal Society A. 115 (772), 700-721 (1927), doi:10.1098/rspa.1927.0118 .[47] Hethcote H,
The Mathematics of Infectious Diseases , SIAM Review. 42 (4): 599-653, doi:10.1137/s0036144500371907 [48] Harko T., Lobo Francisco S. N.; Mak, M. K. (2014),
Exact analytical solutions of theSusceptible-Infected-Recovered (SIR) epidemic model and of the SIR model with equal deathand birth rates , Applied Mathematics and Computation. 236: 184-194. arXiv:1403.2160, doi:10.1016/j.amc.2014.03.030 .[49] H. W. Hethcote, D. W. Tudor,
Integral equation models for endemic infectious diseases , Jour-nal of mathematical biology , 37 (1980). https://doi.org/10.1007/BF00276034 .[50] H. J. Wearing, P. Rohani, M. J. Keeling, Appropriate models for the management of infectiousdiseases , PLoS medicine , 621 (2005). https://doi.org/10.1371/journal.pmed.0020174 .[51] N. Keyfitz, On Future Population , Journal of the American Statistical Association, 67, 347(1972), d oi: 10.1080/01621459.1972.10482386, [52] P.-F. Verhulst, Correspondance Math´ematique et Physique. 10: 113-121
53] F. J. Richards,
A Flexible Growth Function for Empirical Use , Journal of Experimental Botany10, 290 (1959), .[54] Y. H. Hsieh, J.-Y. Lee, and H. L. Chang,
SARS epidemiology, logistic-type model, and cumu-lative case number , Emerging Infectious Diseases, vol. 10, 1165 (2004).[55] Y.-H. Hsieh and Y.-S. Cheng,
Real-time Forecast of Multiphase Outbreak , Emerging InfectiousDiseases 12, 122 (2006), doi:10.3201/eid1201.050396 .[56] England data: (as in the website on July 31, 2020): https://coronavirus.data.gov.uk/ [57] Germany data: https://en.wikipedia.org/wiki/COVID-19_pandemic_in_Germany [58] France data: https://en.wikipedia.org/wiki/COVID-19_pandemic_in_France [59] Italy data: https://en.wikipedia.org/wiki/COVID-19_pandemic_in_Italy [60] Spain data: https://en.wikipedia.org/wiki/COVID-19_pandemic_in_Spain [61] NYC data: https://en.wikipedia.org/wiki/COVID-19_pandemic_in_New_York_City [62] Russia data: https://en.wikipedia.org/wiki/COVID-19_pandemic_in_Russia [63] Brazil data: https://en.wikipedia.org/wiki/COVID-19_pandemic_in_Brazil [64] India data: https://en.wikipedia.org/wiki/COVID-19_pandemic_in_India [65] India data: Ministry of Health and Family Welfare, Government of India, [66] India data: Covid19-India-Tracker: [67] Mumbai data: https://arogya.maharashtra.gov.in/1175/Novel--Corona-Virus , , [68] Delhi data: https://en.wikipedia.org/wiki/COVID-19_pandemic_in_Delhi , , https://delhifightscorona.in/ [69] USA data: https://en.wikipedia.org/wiki/COVID-19_pandemic_in_the_United_States [70] Worldometer: [71] World population review: https://worldpopulationreview.com/ , https://indiapopulation2020.in/ [72] P. Junnarkar and N. Mathur, Deuteron-like heavy dibaryons from Lattice QCD ,Phys. Rev. Lett. 123, 162003 (2019) https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.123.162003 [73] N. Mathur, M. Padmanath and S. Mondal,
Precise predictions of charmed-bottom hadrons rom lattice QCD , Phys. Rev. Lett. 121, 202002 (2018), https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.121.202002 [74] M. Padmanath and N. Mathur Quantum Numbers of Recently Discovered Ω c Baryonsfrom Lattice QCD , Phys. Rev. Lett. 119, 042001 (2017), https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.119.042001 [75] D. W. Hogg, Jo Bovy and Dustin Lang,
Data analysis recipes: Fitting a model to data :arxiv:1008.4686 https://arxiv.org/abs/1008.4686 [76] Lee, P.M. Lee
Bayesian Statistics: An Introduction , Arnold. ISBN 0-340-67785-6
VI. APPENDIX
Here we show some of the numerical details, particularly describing the method employedfor obtaining the confidence intervals.For a given data set { days, number of infection and error ≡ t , y ( t ) and σ ( t ) , t = 1 , ..., N } and the model function ( f ( t )), where f ( t ) is the time evolution trajectory as in Eqs. (1)and (13), we calculate the minimum of χ , defined as χ ( f ) = N (cid:88) t =1 ( y ( t ) − f ( t )) σ ( t ) . (16)A non-linear fitting method is utilized for the minimization. The total minimum χ is ob-tained by fitting all periods simultaneously (by varying t e , t m , t s i together). For Eq. (13) thisminimization is performed for each trajectory with a given set of parameters and we chooseonly those which are within the acceptable minimum χ . The non-linear fitting methodemployed here has been extensively used previously for lattice gauge theory calculationsand data analysis [72–74].To maximize the probability of the observed data given the model, we follow Ref. [75]and calculate the likelihood of the parameters, defined as L = N (cid:89) i =1 p ( y i | x i , σ yi , { α } ) . (17)Here we use the generic notation x for time ( t ) and { α } represents all parameters of themodel, and the frequency distribution p ( y i | x i , σ yi , { α } ) is taken to be Gaussian N (cid:89) i =1 p ( y i | x i , σ yi , { α } ) = 1 (cid:113) πσ yi exp (cid:18) − ( y i − f ( i, { α } )) σ yi (cid:19) . (18)45e then adopt the Bayesian way of data-analysis [76] using p (model | data) = p (data | model) p (model) p (data) , (19)to generalize the above frequency distribution as p (cid:0) { α }|{ y i } Ni =1 , I (cid:1) = p (cid:0) { y i } Ni =1 |{ α } , I (cid:1) p ( { α }| I ) p ( { y i } Ni =1 | I ) , (20)where • I : all the prior knowledge of the data and the problem. • p ( { α }| I ): the prior probability distribution for the parameters ( { α } ) that representsall knowledge except the data • p (cid:0) { α }|{ y i } Ni =1 , I (cid:1) : the posterior probability distribution for the parameters ( { α } ) withthe given data and the prior knowledge • p (cid:0) { y i } Ni =1 | I (cid:1) : A normalization constantOne gets the peak of the posterior probability at the best-fit values of the parameters ( { α } )while its moments provide the uncertainties of these parameters.Since the above posterior probability distribution with the given number of parameters ofour model is quite complicated, we adopt a Markov-Chain-Monte-Carlo (MCMC) method togenerate this distribution with the given data set and model. The priors for the parametersare chosen uniformly within the relevant ranges. We use more than 50000 MCMC stepsin most cases after suitably adjusting the autocorrelation times. Each MCMC steps areinitialized with a Gaussian ball around the maximum likelihood result.In Fig. 18 we show the projections of the posterior probability distributions of modelparameters for the mitigation period. Since this period is most complicated as well asimportant and more number of parameters are involved, we choose this period to showresults from the above-mentioned analysis method. Fig. 18 is for the data on USA andthe extracted parameters correspond to Fig. 16. The bands represent the one, two andthree sigma confidence intervals. In Fig. 19 we show similar results for the NY city in itssaturation period.Finally the fitted error in the cumulative number of infection, particularly for projectionpurpose, was calculated by bootstrapping the MCMC samples. In Figs. 16 and 17 we showsuch errors by 68% and 95% confidence intervals.46 . . . . . .
991 3 .
992 3 .
993 3 .
994 3 . m . . . .
32 0 . . . . . .
20 1 .
24 1 .
28 1 . FIG. 18: Projections of the posterior probability distributions of model parameters, α m , λ and γ ,corresponding to the mitigation period (Eq. (1b)) of USA. Data represents the cumulative numberof infection for USA [69]. E . . . . . s . . . . . s FIG. 19: Projections of the posterior probability distributions of model parameters, E and α s ,corresponding to the saturation period (Eq. (1c)) of NY city. Data represents the cumulativenumber of infection for NY city [61].,corresponding to the saturation period (Eq. (1c)) of NY city. Data represents the cumulativenumber of infection for NY city [61].