[PDF] A model for the spread of an epidemic from local to global: A case study of COVID-19 in India

Abstract

In this paper we propose an epidemiological model for the spread of COVID-19. The dynamics of the spread is based on four fundamental categories of people in a population: Tested and infected, Non-Tested but infected, Tested but not infected, and non-Tested and not infected. The model is based on two levels of dynamics of spread in the population: at local level and at the global level. The local level growth is described with data and parameters which include testing statistics for COVID-19, preventive measures such as nationwide lockdown, and the migration of people across neighboring locations. In the context of India, the local locations are considered as districts and migration or traffic flow across districts are defined by normalized edge weight of the metapopulation network of districts which are infected with COVID-19. Based on this local growth, state level predictions for number of people tested with COVID-19 positive are made. Further, considering the local locations as states, prediction is made for the country level. The values of the model parameters are determined using grid search and minimizing an error function while training the model with real data. The predictions are made based on the present statistics of testing, and certain linear and log-linear growth of testing at state and country level. Finally, it is shown that the spread can be contained if number of testing can be increased linearly or log-linearly by certain factors along with the preventive measures in near future. This is also necessary to prevent the sharp growth in the count of infected and to get rid of the second wave of pandemic.

Full PDF

aa r X i v : . [ phy s i c s . s o c - ph ] J un A model for the spread of an epidemic from localto global: A case study of COVID-19 in India

Buddhananda Banerjee ∗ , Pradumn Kumar Pandey † , and Bibhas Adhikari ‡ Abstract.

In this paper we propose an epidemiological model for the spread of COVID-19.The dynamics of the spread is based on four fundamental categories of people in a population:Tested and infected, Non-Tested but infected, Tested but not infected, and non-Tested andnot infected. The model is based on two levels of dynamics of spread in the population: atlocal level and at the global level. The local level growth is described with data and param-eters which include testing statistics for COVID-19, preventive measures such as nationwidelockdown, and the migration of people across neighboring locations. In the context of India,the local locations are considered as districts and migration or traﬃc ﬂow across districtsare deﬁned by normalized edge weight of the metapopulation network of districts which areinfected with COVID-19. Based on this local growth, state level predictions for number ofpeople tested with COVID-19 positive are made. Further, considering the local locationsas states, prediction is made for the country level. The values of the model parameters aredetermined using grid search and minimizing an error function while training the model withreal data. The predictions are made based on the present statistics of testing, and certainlinear and log-linear growth of testing at state and country level. Finally, it is shown thatthe spread can be contained if number of testing can be increased linearly or log-linearly bycertain factors along with the preventive measures in near future. This is also necessary toprevent the sharp growth in the count of infected and to get rid of the second wave of pandemic.

Keywords.

COVID-19, metapopulation network, grid search

COVID-19 is a pandemic that is actively spreading in the whole world and is an unprecedentedchallenge for the human race. All the countries infected with COVID-19 are struggling tomitigate the spread through various strategies. This disease is spread by inhalation or contactwith infected droplets or fomites. It is observed that successful medical testing and as a result,detection of people infected with SARs-Cov-2 becomes one of the crucial control strategiesfor the spread of COVID-19 [1]. For instance, the epidemic curve in The Republic of SouthKorea suggests that this control strategy in South Korea has curtailed the epidemic. Besides,testing is also linked to tracing contact lists of the infected people and ﬁnally self-isolationof those people help against the spread. The success of containment of COVID-19 in theRepublic of Taiwan has also the inﬂuence of proactive testing [2].Given the fact that there in no eﬀective antiviral vaccine or drug should coming soon,diﬀerent prevention strategies are adapted by diﬀerent countries that include voluntary orcompulsory quarantine, stopping of mass gatherings, closure of educational institutions orworkplaces, social distancing or even nationwide lockdown. However, these strategies may ∗ Department of Mathematics and Centre for Excellence in Artiﬁcial Intelligence, IIT Kharagpur, E-mail:[email protected] † Department of Computer Science and Engineering, IIT Roorkee, India Email: [email protected] ‡ Corresponding author, Department of Mathematics and Center for Theoretical Studies, IIT Kharagpur,India, E-mail: [email protected]

Introduction act less signiﬁcant for the infected people who are at the pre-symptomatic stage, and inthat case they act as invisible spreaders for the disease [3]. Thus it becomes increasinglyimportant for mass medical testing for a country. Several researchers around the world areactively working on producing mathematical models of the spread of COVID-19. Here wequote that ‘model-based predictions can help policy makers make the right decisions in atimely way, even with the uncertainties about COVID-19 [4].The primary preventive steps adapted by the Government of India fall into ﬁve categorieswhich include social distancing, movement restrictions public health measures, social andeconomic measures, and nationwide lockdown. A few notable decisions by the Governmentof India are given in Table 7. It should be noted that a complete nationwide lockdown fromMarch 25 till May 13 helped to control the spread the disease at large distances but failedto prevent it in neighboring districts, as observed in [5]. For example, before lockdown,infected cases are reported from diﬀerent districts across India which are at large distancesapart, however during the lockdown period it has been observed that new spread is reportedin districts which are neighbors of infected districts. Besides, due to lack of well plannedpolicy for migrant workers several of them have been travelling to their native districts duringlockdown. Unavailability of data of such a traﬃc ﬂow across the districts can be crucial inorder to do a precise analysis of the spread. It can also be seen that the preventive measuresproposed by the Government of India are similar to those adapted in other counties.It is observed in various studies that COVID-19 exhibits signiﬁcantly diﬀerent epidemio-logical attributes than other well studied epidemics in past. Thus it is of paramount interestto develop mathematical models which can characterize the inherent dynamics of the spreadof COVID-19. Standard epidemic models such as SIR model considers human-to-humantransmission, and it describes the diﬀusion process through three mutually exclusive stages ofinfection: Susceptible, Infected and Recovered. These models are also called compartmentalmodels [6] which enables to compartmentalize diﬀerent individuals based their states for theepidemic in a population. This model can help gain some insights about the growth of theinfection based on approximating the model parameters from the available data. However dueto a peculiar growth of COVID-19 in diﬀerent countries, researchers have extended the SIRmodel and other existing models such as SIS model in order to acquire meaningful insightsabout spread of the COVID-19 [7]. It is very important to note that these studies can helpus to frame control strategies and policies that can mitigate the epidemic [8] [9].One of the ﬁrst models for the spread of COVID-19 is proposed by Anastassopoulou et al.based on the data of conﬁrmed cases reported at the Hubei province of China from the 11thof January until the 10th of February, 2020 [10]. They propose a discrete SIRD (Susceptible-Infected-Recovered-Dead) model and estimate the mean values of the corresponding epidemi-ological parameters such as basic reproduction number, the case fatality and case recoveryratio from the data. This model enables to forecast about the spread in near future. In ananother attempt, in [11] the authors study the datasets of transmission from within and out-side Wuhan, China to estimate how transmission in Wuhan varied between December 2019,and February 2020, and assess the potential for sustained human-to-human transmission tooccur in locations outside Wuhan through a stochastic transmission dynamic model. In [12],a mean-ﬁeld epidemiological model is proposed for COVID-19 epidemic in Italy by extend-ing the classical SIR model. Here, in addition to susceptible (S) and infected (I), the otherstages of individuals are considered as diagnosed (D), ailing (A), recognized (R), threatened(T), healed (H) and extinct (E), collectively termed as SIDARTHE. In [6], an Age-stratiﬁedmodel of the COVID-19 is proposed to capture the age-dependent dynamics for nowcastingand forecasting for Switzerland. This model incorporates the compartments of symptomaticand asymptomatic infected individuals along with susceptible and exposed individuals. In[13], the authors propose a model of COVID-19 epidemic dynamics under quarantine condi-tions. They also develop methods to estimate quarantine eﬀectiveness in a country or a regionwhich is infected with COVID-19. Besides, a few models are proposed for understanding andpredicting the spread of COVID-19 based on metapopulation network approach, see [14] [15] Introduction [16].Several mathematical models are also proposed based on the the available COVID-19 dataof India and ﬁtting them into classical epidemic models incorporating other factors such asnationwide lockdown, social distancing etc., see [17] [18] [19] and the references therein. In[20], a mathematical model of the spread of COVID-19 is proposed based on an age-structuredSIR model. However, the comparison of this model prediction with real data is criticized byDhar in [21]. In [22], the authors perform state-wise analysis of the data of infected populationin diﬀerent states based three models: Exponential Model, Logistic Model and the SIS model.They also provide state-wise prediction for number of infected people for diﬀerent states inrecent future. An elementary network-based model for geographical spread of COVID-19 inIndia is proposed in [23] . In [24], a model for the spread of COVID-19 in India is proposedemphasizing on migration of population based on the spatial network of cities, incorporatingthe growth-dynamics of SIR model at the city-level.In this paper, we propose an epidemiological model for the spread of a contagious epidemicin a region or country. The entire model is based on combining two growth processes of thespread at local and global level. By local, we mean at the level of city or town or districts orprovince, and global mean at the level of state or country. First we develop a new discretemodel for the growth-dynamics of infected people at local level as follows. We consider fourtype of individuals living at a location. These are individuals who are tested as infected ( X ),tested as non-infected ( X ), untested but infected (asymptomatic or pre-symptomatic, X ),and untested and non-infected ( X ) for the disease. Total number of such individuals equalsthe total population living at that location. Given the time series data of these numbers X i ( t ) , t = 1 , , ,

4, we deﬁne the growth statistic X i ( t + 1) − X i ( t ) utilizing X j ( t ) , j = i and four other parameters each one of them is related to the the spreading pattern of thevirus which causes the disease. Note that the diﬀerent standard compartmental models existin literature based on susceptible, infected, recovered, and diseased, which do not preservethe eﬀect of parameters in an epidemic like COVID-19. In our proposed model, the growth-dynamics at local level include the following parameters:(a) Spread due to infected but asymptomatic and pre-symptomatic individuals(b) Eﬀect of preventative measures like lockdown or restricted movement of individualsacross locations(c) Daily testing statistics.Then we consider the metapopulation network of all the locations at local level in orderto incorporate the transmission dynamics of disease at the global level. Here we mentionthat the metapopulation network model is a standard and popular model for analyzing thespread of highly contagious diseases which include Zika virus [25]. Also see [26] and thereferences therein. In our proposed model, the vertices of the metapopulation network arethe locations infected with the disease and the links connecting them represent the possiblemode of transportation or spatial distance such as the great circle distance of the latitude andlongitude coordinates of the locations at local level. The weight of these links, that representthe rate or percentage of transmission of population per unit time such as a day. Then theﬁnal model is deﬁned by combining the dynamics of the spread at local and global level. Thevalues of the model parameters are obtained by a learning technique based on training dataand an error minimization.In the case of COVID-19, we consider the model parameters at the local level as testingstatistic, social distance, and rate of infected people by an infected but untested individual(asymptomatic or pre-sympotatic) per unit time. In the context of India, the locations areconsidered as districts which constitutes the states and union territories of India. There are28 states and 8 union territories in India, and there are a total of 718 districts in India.Based on the proposed model we predict number of COVID-19 infected people both atstate level and the country (India) level. The prediction depends on the number of testing The proposed model COVID+ ve COVID − ve TotalTested X [ l ]1 ( t ) X [ l ]2 ( t ) T l ( t )Non-tested X [ l ]3 ( t ) X [ l ]4 ( t ) ¯ T l ( t )Total C + l ( t ) C − l ( t ) N l Tab. 1:

Distribution of population in location l at time t performed per day. The results show that the total number of infected people at India levelwill be approximately 0.46 Millions on July 7, 2020, 1.9 Millions on November 7, 2020, and4.6 Millions on May 7 2021 when the number of testing is approximately 1,00,000 per dayat the country level (which is the number of testing as on May 7, 2020 approximately). Ifthe number of testing grows linearly (with a certain rate see Section 3.2) then the number ofpeople tested positively with COVID-19 would be approximately 2 Millions on July 7, 2020,59 Millions on November 7, 2020, and 130 Millions on May 7, 2021. Finally, if the number oftesting grows log-linearly (with a certain rate, see Section 3.2) then the number COVID-19infected people in India would be approximately 1.3 Millions on July 7, 2020, 3.77 Millions onNovember 7, 2020, 8.5 Millions on May 7, 2021. Note that these above mentioned predictionsare made when there is no external measure is used to control the spread, for example, usingany cure like a vaccine or drug discovered in between. Further using numerical simulationwe show that the spread stops when daily number of testing increases linearly or log-linearly,however if the number of testing remains approximately the same as of May 7, 2020 the spreadneed not stop in recent future, say in the year 2021. Let V = { l | l is the index of a location } be the set of locations where persons infected withCOVID-19 are likely to stay in or move to on a day t . Suppose that N l is the populationsize in location l . Now we introduce the following notations to model the distribution anddynamics of pandemic. If T l ( t ) denotes the number of tested individuals in the location l then¯ T l ( t ) = N l − T l ( t ) stands for the number non-tested individuals up to time t . Let C + l ( t ) and C − l ( t ) be the total number of people infected and non-infected with COVID-19, respectivelyin a location l ∈ V . Here these temporal data varies with time ( t ) measured in days. In anylocation l for a given day t, we deﬁne a random vector X [ l ] ( t ) = h X [ l ]1 ( t ) X [ l ]2 ( t ) X [ l ]3 ( t ) X [ l ]4 ( t ) i T with four components for the distribution of population the N l . Based on the above discussion X [ l ] ( t ) can be represented in a 2 × X j =1 X [ l ] j ( t ) = N l , ∀ l ∈ V , the total population at the location l , though X [ l ]3 ( t ) & X [ l ]4 ( t ) are unobserved or latent randomvariables. Unlike the standard epidemic models, the asymptomatic infected people or whoare infected with COVID-19 but not tested, that is, X [ l ]3 ( t ) may inﬂuence the number X [ l ]1 ( t ′ )at a future date t ′ > t. Besides, C + l ( t ) highly depends on the contact networks of C + l ( t ′′ ) at aprevious date t ′′ < t. But only X [ l ]1 ( t ) is observed. Thus the number of people who are testedfor COVID-19 at a given day governs the dynamics of X [ l ] ( t ) at a location l over time. The proposed model Let e T l ( t + 1) be a strategic number which provides the target quantity of new tests forCOVID19 to be performed on day ( t + 1) in location l. Given the statistic X [ l ] ( t ) , new testsalso depends on the availability of test-kits. However, this also depends on ¯ T l ( t ), the numberof people not tested for the disease at the location l. Hence, we deﬁne the possible number oftests to be performed at l as e T ∗ l ( t + 1) = min { e T l ( t + 1) , ¯ T l ( t ) } . (1)In Table 2, we introduce some generic notations of model-parameters that are used todevelop the dynamics of the system and some more hyper-parameters that are involved intraining and updating of model-parameters. All the parameters modiﬁed with suﬃx/super-ﬁxaccording to the time and locations accordingly.Parameters Interpretations λ Testing-coverage probability among the infected λ Infection spreading probability λ Probability of population migration among locations α Average family size θ Mobility of individuals ǫ Error parameterHyper-parameters Interpretations α Changing rate of λ α Changing rate of λ for future β Changing rate of λ r Rate of increment in testing under linear growth. r Rate of increment in testing under log-linear growth.

Tab. 2:

Model parameters and hyper-parametersNow we deﬁne the dynamics of change of X [ l ] ( t ) for any location l.X [ l ]1 ( t + 1) − X [ l ]1 ( t ) = ∆ t X [ l ]1 = bin (cid:16) min { e T ∗ l ( t + 1) , X [ l ]3 ( t ) } , λ [ l ]1 ( t + 1) (cid:17) (2) X [ l ]2 ( t + 1) − X [ l ]2 ( t ) = ∆ t X [ l ]2 = min { e T ∗ l ( t + 1) − ∆ t X [ l ]1 , X [ l ]4 ( t ) } , (3) X [ l ]3 ( t + 1) − X [ l ]3 ( t ) = ∆ t X [ l ]3 = max {− ∆ t X [ l ]1 + min { a [ l ] ( t + 1) , X [ l ]4 ( t ) } , − X [ l ]3 ( t ) } (4) X [ l ]4 ( t + 1) − X [ l ]4 ( t ) = ∆ t X [ l ]4 = max {− ∆ t X [ l ]2 − min { a [ l ] ( t + 1) , X [ l ]4 ( t ) } , − X [ l ]4 ( t ) } (5)where a [ l ] ( t + 1) = bin( α ∆ t X [ l ]1 , λ [ l ]2 ( t + 1)) + Pois λ [ l ]3 ( t + 1) N X k =1 m kl ( t ) X [ l ]3 ( t ) ! + Pois( ǫ ) .λ [ l ]1 ( t + 1) ∈ (0 ,

1) is testing-coverage probability among the infected in location l attime ( t + 1). Hence, only a fraction of X [ l ]3 ( t ) will be will be identiﬁed as ∆ t X [ l ]1 . So it ismodelled with binomial distribution. λ [ l ]2 ( t + 1) ∈ (0 ,

1) is a probability indicating the averagespread of infection among near by people of a group of infected individuals. So, new spreadidentiﬁed-infected people is also modelled with binomial random variable bin( α ∆ t X [ l ]1 , λ [ l ]2 ( t +1)). Now λ [ l ]3 ( t + 1) ∈ (0 ,

1) is a probability closed to zero indicating the inﬂuence fromadjacent locations. As a consequence it is modelled with Pois (cid:16) λ [ l ]3 ( t + 1) P Nk =1 m kl ( t ) X [ l ]3 ( t ) (cid:17) .Parameter ǫ > X [ l ]3 ( t ) and X [ l ]4 ( t ) are latent variables at a given day t. The parameters λ j ( t + 1) , j = 1 , , The proposed model Now we consider the meta-population network G ( t ) with vertex set V of locations in orderto incorporate the eﬀect of transmission of COVID-19 across the locations. Let A ( t ) = [ a kl ( t )]denote the adjacency matrix associated with G ( t ) . Let d kl denote the distance between k and l. Then deﬁne the weights of the edges of G ( t ) as w kl ( t ) ∝ exp (cid:26) − d kl θ ( t ) (cid:27) (6)where θ ( t ) is the mobility parameter. Here w kl denotes diﬀusion weight for the human traﬃcﬂows per day between the neighboring locations k and l. The value of θ ( t ) > θ ( t ) may be considered as a small value. Now we deﬁne the matrix M ( t ) = [ m kl ( t )]where m kl = w kl ( t ) P |V| l =1 w kl ( t )which is a row-stochastic matrix. Finally we propose the following predictive model at thelevel of state and country for the number of COVID-19 infected people.Note that the traﬃc ﬂow between locations inﬂuences the value of X [ l ]4 ( t + 1) as followedby Eq. (5) which contribute to X [ l ]3 ( t + 1) and ﬁnally to the number of infected people X [ l ]1 ( t + 1) . Besides, the number of nodes in the metapopulation network G ( t ) varies withtime. At time t, the nodes of G ( t ) represented the districts which are aﬀected by the diaeaseat time t. Thus at the level of state S which consists of some locations, X [ S ]1 = P l ∈ S X [ l ]1 atanytime t. Further, the number of infected people at the country level is calculated based onthe proposed dynamics of X [ l ] i , i ∈ { , , , } where l is a state. This is done presumably dueto the traﬃc ﬂow between neighboring districts may be diﬀerent from the traﬃc ﬂow betweenneighboring states. Hence, at the country level, say India, denoted by I, X [ I ]1 = P S ∈ I X [ S ]1 at anytime (day) t. In this section we discuss how to determine the values of the parameters involved in theproposed epidemiological model. Note that the initial values can be assumed wisely based onits characteristics observed from data and then as the time passes the model can update thevalues of the parameters from observed and simulated data. Let [ t , t ] be the learning periodthroughout which the real data is available and the model can learn the data for estimating thevalues of the parameters. Consequently, the growth-dynamics of parameters can be deﬁnedwhich can update the values of the parameters when the real data is not available in future.First we consider the parameter λ [ l ]1 ( t ) . Then deﬁne λ [ l ]1 ( t + 1) = λ [ l ]1 ( t ) + α e X [ l ]1 ( t ) − X [ l ]1 ( t ) m [ l ] , (7)for some α ≥ m [ l ] = max t ′ ≤ t | e X [ l ]1 ( t ′ ) − X [ l ] ( t ′ ) | , where, e X [ l ]1 ( t ) is the reported number of tested-positive cases in location l at time t , and X [ l ]1 ( t ) is the value of tested-positive cases obtained from simulation. In Eq. (7), infectionspreading rate λ [ l ]1 ( t ) is updated in such a way that if the number of tested and infected casesare more than the simulated values then infection spreading rate would be more as comparedto current rate of of infection and vice versa. The value of α represents the slope of the linealong which λ [ l ]1 ( t ) increases with time linearly.Eq. (7) is explained pictorially, in Figure 1, consider that points connected via black linesare corresponding to real data points, and points connected via green lines are corresponding The proposed model to simulated points using λ [ l ]1 ( t ) = λ [ l ]1 ( t + 1). In such scenario error e X [ l ]1 ( t ) − X [ l ]1 ( t ) increases.For better ﬁt of the model we need to update the parameter λ [ l ]1 ( t ) in such a way that X [ l ]1 ( t +1)can come closer to X ′ [ l ]1 ( t + 1). In ﬁgure, e X [ l ]1 ( t ) > X [ l ]1 ( t ). X [ l ]1 ( t + 1) is the number of testedpositive cases, if we increase the rate of infection spread λ [ l ]1 ( t ) then we can get X [ l ]1 ( t + 1)closer to e X [ l ]1 ( t + 1), point X [ l ]1 ( t + 1) connected to point X [ l ]1 ( t ) via blue line.For the growth of λ [ l ]2 ( t ) over time which represents the probability of the spread of thedisease at a location l. Thus we deﬁne λ [ l ]2 ( t + 1) = λ [ l ]2 ( t ) − β e T l ( t + 1) − e T l ( t ) P Nj =1 e T j ( t + 1) ! , (8)where β ≥ e T l ( t ) denotes the number of tests performed at the location l at time t . Hereobserve that, the intuition behind Eq. (8) is that the probability of spread of the diseasedepends on the number of testings done at the location l. We consider constant values of λ [ l ]3 ( t ) and ǫ in current version. Fig. 1:

Figure explains the way to update infection spreading rate.

Recall that individuals who are at the asymptomatic and pre-symptomatic stages of infection,act as invisible spreaders for the disease. Hence, detection of individuals who are infectedwith the virus plays an important role into the growth-dynamics of the number of infectedindividuals at a particular location. Thus one of the control strategies to prevent the spreadis to conduct enough number of tests per day and separate-out the infected people. In acountry like India, where approximately 1.4 billion people live, conducting enough tests perday could be a diﬃcult exercise. Besides, due to lack of huge number of test-kits and medicalfacilities, India is facing a lot of challenges to perform enough tests per day. The testing datain India is plotted in Fig. 2 which is obtained from [27]. It may be observed that the datais not available for three consecutive days after the 30th day. Besides the testing data is notavailable before March 19, 2020.Note that testing for COVID-19 for random sampling of individuals is not desired due toscarcity of enough testing kits for a large population and medical support facilities. Indeed,targeted testing by tracing social contacts of newly detected individuals with COVID-19can be more eﬃcient for identifying asymptomatic and pre-symptomatic individuals who areinfected with the virus. Hence the increment of number of testing per day should depend onthe testing-coverage probability among the infected individuals at a particular location, thatis, λ [ l ]1 . The proposed model Fig. 2:

Testing performed daily in India, from March 19,2020 to May 20,2020 [27].In this model we incorporate two possible growth of testing data over time at a location:linear and log-linear. The parameters which we call rate of gain in the number of tests forCOVID-19, are denoted by r and r for the following linear and log-linear growth equationsrespectively. From the real data it can be observed that the number of tested positive caseshas positive correlation (0 . λ [ l ]1 ( t )has positive dependency over the number of tested positive cases. Thus, the number of testsperformed has positive relation with λ [ l ]1 .Let e T l ( t ) denote the number of tests performed at a location l on a day t. Then deﬁnelinear increment of testing: e T l ( t + 1) = l e T l ( t ) + r λ [ l ]1 ( t ) m . (9)and log-linear increment of testing: e T l ( t + 1) = l(cid:16) r λ [ l ]1 ( t ) (cid:17) e T l ( t ) m , (10)Thus assigning small values of λ [ l ]1 ( t ) , λ [ l ]2 ( t ) in the beginning of the simulation of the model, λ [ l ]1 ( t ), λ [ l ]2 ( t ), and e T l ( t ) are updated according to Eqs. (7), (8), and (10) or (9) respectively.Further, α , β , and r can be selected from the interior of the unit cube given by (0 , × (0 , × (0 , , whereas r can be larger than 1. The searching method is well-known as as threedimensional grid search . Indeed, mapping the growth given by Eqs. (7), (8), and (10) withreal data, the values of α , λ [ l ]1 ( t ), β , λ [ l ]2 ( t ), r (or r ), and e T l ( t ) can be learned and estimatedsuch that the total testing (cid:16)P l e T l ( t ′ ) (cid:17) , and total tested and infected cases (cid:16)P l X [ l ]1 ( t ′ ) (cid:17) attime t ′ ≤ t that are close to real data. It is discussed in details in the next subsection. Theseestimated values can be used for the training of the model.Suppose that α , λ [ l ]1 ( t ′ ), β , λ [ l ]2 ( t ′ ), r ′ (or r ′ ), and e T l ( t ′ ) are the learned values from thegiven data. However, for any t > t ′ when the real data are not available, the trained modelcan be used for prediction. Thus we deﬁne the update of λ [ l ]1 ( t ) as follows: λ [ l ]1 ( t + 1) = λ [ l ]1 ( t ) + α X [ l ]1 ( t ) − X [ l ]1 ( t −

1) + X [ l ]1 ( t − P Nl =1 (cid:16) X [ l ]1 ( t ) − X [ l ]1 ( t − (cid:17) , (11) The proposed model Let X [ l ]1 ( t ) and e X [ l ]1 ( t ) be the simulated and observed numbers of detected after test as infectedwith COVID-19 respectively at a location l at the time (day) t. Consider the time series ofreal data e X [ l ]1 ( t ) where t ≤ t ≤ t , for a particular location l ∈ V which is the vertex set ofthe metapopulation network. Then the complete observed data-set is given by X = { e X [ l ]1 ( t ) : l ∈ V , t ≤ t ≤ t } . Then the data X is divided into two sets which we call the training setand validation set as follows for estimating the model parameters which deﬁne X [ l ]1 ( t ) . Let t ′ ∈ ( t , t ) . Deﬁne X T = { e X [ l ]1 ( t ) : l ∈ V , t ≤ t ′ } (Training set) (12) X V = { e X [ l ]1 ( t ) : l ∈ V , t ′ < t ≤ t } . (Validation set) (13)The model parameters are calculated which minimize the error function e = wT e + (1 − w ) V e (14)where w = | X V || X T | + | X V | (15) T e = 1 | X T | X l ∈V , e X [ l ]1 ( t ) ∈ X T | e X [ l ]1 ( t ) − X [ l ]1 ( t ) | (16) V e = 1 | X V | X l ∈V , e X [ l ]1 ( t ) ∈ X V | e X [ l ]1 ( t ) − X [ l ]1 ( t ) | . (17)Note that the weight w is deﬁned such that T e and L e are computed over two diﬀerentsets X V and X T respectively to avoid the imbalances in the data. Now we discuss how the values of the model parameters estimated by real data at locallocation can be used to predict the number of infected people at a global level such as stateand country level in near future. We propose to train the model based on two methodologiesat the the state level and country level. Recall that a state in India consists of several districts(locations denoted by l ), and in India there are 28 states and 8 union territories. In this paperwe adapt two-step approach for the prediction. The global level parameters include the socialmobility parameter θ , and the traﬃc ﬂow across the local level locations, given by the edgeweight of the metapopulation network.First, we make state level prediction, that is, X [ S ]1 = P l ∈ S X [ l ]1 ( t ) when X [ l ]1 is consideredat district level l , where S is a state of India. The metapopulation network for a state S isformed by the vertices which are districts belong to the state S, and the traﬃc ﬂow which isrepresented by weights w kl deﬁned by Eq. (6). The distance d kl between two districts k, l isdeﬁned by the great circle distance between the longitude and latitude coordinates of k and l. Next, once the estimates for X S are obtained for all states S in India, the prediction at thethe nation level is obtained by applying the proposed model treating the location as states.Thus model parameters are further estimated comparing with the real data at the level ofstates, as described above. Further, the metapopulation network of states is constructed,and the traﬃc ﬂow is calculated using the wight formula w kl where the distance between twostates is considered as the great circle distance between the longitude and latitude coordinatesof states k and l. Prediction with model and real data: a case study of India Data (Learning)ModelData (Validation)

X: 78Y: 3.872e+04 (a) Data (Learning)ModelData (Validation)

X: 78Y: 1.509e+04 (b)