[PDF] SUTRA: A Novel Approach to Modelling Pandemics with Asymptomatic and Undetected Patients, and Applications to COVID-19

Abstract

In this paper, we present a new mathematical model for pandemics that have asymptomatic patients many of whom remain undetected, called SUTRA. The acronym stands for Susceptible, Undetected, Tested (positive), and Removed Approach. There are several novel features of our proposed model. First, whereas previous papers have divided the patient population into Asymptomatic and Infected, we have explicitly accounted for the fact that, due to contact tracing and other such protocols, some fraction of asymptomatic patients could also be detected; in addition, there would also be large numbers of undetected asymptomatic patients. Second, we have explicitly taken into account the spatial spread of a pandemic over time, through a parameter called "reach." Third, we present numerically stable methods for estimating the parameters in our model. We have applied our model to predict the progression of the COVID-19 pandemic in several countries. We present our predictions for countries with three quite distinct types of disease progression, namely: (i) countries where nearly all of population still remains outside the reach of the pandemic, (ii) countries where a reasonable fraction of population is both within and outside the reach, and (iii) countries where nearly all of population s within the reach of the pandemic. In all cases, the predictions closely match the actually observed outcomes.

Full PDF

SSUTRA: An Approach to Modelling Pandemicswith Asymptomatic Patients,and Applications to COVID-19

Manindra Agrawal, Madhuri Kanitkar and Mathukumalli Vidyasagar ∗ February 2, 2021

Abstract

In this paper, we present a new mathematical model for pandemics that have asymptomaticpatients, called SUTRA. The acronym stands for Susceptible, Undetected, Tested (positive),and Removed Approach. There are several novel features of our proposed model. First, whereasprevious papers have divided the patient population into Asymptomatic and Infected, we haveexplicitly accounted for the fact that, due to contact tracing and other such protocols, somefraction of asymptomatic patients could also be detected; in addition, there would also be largenumbers of undetected asymptomatic patients. Second, we have explicitly taken into accountthe spatial spread of a pandemic over time, through a parameter called “reach.” Third, wepresent numerically stable methods for estimating the parameters in our model.We have applied our model to predict the progression of the COVID-19 pandemic in severalcountries. Where data on the number of recovered patients is available, we predict the number ofactive cases as a function of time. Where recovery data is not available, we predict the numberof daily new cases. We present our predictions for three countries with quite distinct typesof disease progression, namely: (i) India which has had a smooth rise followed by an equallysmooth fall-oﬀ in the number of active cases, (ii) Italy, which has witnessed multiple peaks inthe number of active cases, and has also witnessed multiple “phases” of the pandemic, and (iii)the USA which has erratic recovery data. In all cases, the predictions closely match the actuallyobserved outcomes.

The COVID-19 pandemic caused by the SARS-CoV-2 virus has by now led to more than ninetymillion cases and nearly two million deaths worldwide [31]. By way of comparison, the infuenzaepidemic of 1957 led to 20,000 deaths in the UK and 80,000 deaths in the USA, while the 1968inﬂuenza pandemic led to 30,000 deaths in the UK and 100,000 deaths in the USA. In contrast,the COVID-19 pandemic has already led to nearly 400,000 deaths in the USA and 85,000 deaths inthe UK [12], and the end is nowhere in sight. Even allowing for the increase in population duringthe past half-century, it is evident that the current pandemic is the most deadly since the “Spanishﬂu” of 1918–19. Among large economies, the USA, UK, Italy, and Spain, have all registered morethan 1,100 deaths per million population [31]. In these countries, the pandemic appeared to haveabated, only to return with a “second wave” and sometimes even a “third wave” [22], each wave ∗ MA is with the Department of Computer Science, Indian Institute of Technology Kanpur, Kanpur, UP 208016;Email: [email protected]. MK is Deputy Chief Integrated Defence Staﬀ (Medical), Headquarters IntegratedDefence Staﬀ, Ministry of Defence, New Delhi; Email: [email protected]. MV is with the Department ofArtiﬁcial Intelligence, Indian Institute of Technology Hyderabad, Kandi, TS 502285; Email: [email protected]. a r X i v : . [ q - b i o . P E ] J a n eing more severe than its predecessor, both in terms of the number of daily new cases and deaths.In comparison, India has escaped relatively unscathed, with just around one hundred deaths permillion population to date [31], and thus far no second wave of infections. However, because of itslarge population, in absolute numbers India has registered the second largest number of cases afterthe USA, and the third highest number of deaths after the USA and Brazil [31].In order to cope with a health crisis of this magnitude, governments everywhere would requireaccurate projections of the progress of the pandemic, both in space and over time. Over the pastcentury or so, various epidemiological models have been developed, as reviewed in the next section.All of these models are based on the premise that the disease spreads when an infected personcomes into contact with a susceptible person. However, a distinctive feature of the COVID-19disease is the presence of a huge number of asymptomatic persons, who are infected and thuscapable of infecting others, but are not explicitly identiﬁed by the health authorities owing to theirnot showing any symptoms. The contributions of the present paper are: (i) the formulation ofa mathematical model for the spatial and temporal evolution of a pandemic with asymptomaticpatients, (ii) a numerically robust method for estimating the parameters in the model, and (iii)validation of the proposed methodology by applying it to several countries around the world withhighly disparate trajectories for the number of cases. Literature Review

There is a vast literature on the modelling of epidemics. A comprehensive review [11] published inthe year 2000 already had 200+ references. Book length treatments are available in [2, 8, 14, 6, 20].According to [21], there are no fewer than thirty six epidemiological models. Historically the ﬁrstepidemiological model is the SIR model introduced in [15], given by˙ S = − βIS, ˙ I = βIS − γI, ˙ R = γI, (1)where S, I, R denote respectively the fraction of the populatioon that is Susceptible, Infected, andRemoved. Note that ˙ S + ˙ I + ˙ R = 0. Consequently S ( t ) + I ( t ) = R ( t ) = 1 for all t . Therefore wecan ignore anyone of the three equations and focus only on the other two. Most authors ignore R and study ˙ S = − βIS, ˙ I = βIS − γI, (2)where β, γ > β is called the “contactparameter” and represents the likelihood that contact between a susceptible individual and aninfected individual leads to a fresh infection, while γ denotes the rate at which infected personsrecover. In principle, there should be a time delay in the above equations (2), in the form˙ S ( t ) = − βI ( t − ∆) S ( t − ∆) , ˙ I ( t ) = βI ( t − ∆) S ( t − ∆) − γI ( t ) , where ∆ denotes the incubation period of the virus in an infected person. However, it is shown in[2, 14] that, other than complicating the solution of the equations, the time delay does not changethe qualitative behavior of the solutions. Therefore practically all researchers do not introduce sucha delay, and neither do we.The ratio R := β/γ is called the basic reproduction ratio . Its signiﬁcance lies in the factthat if R S (0) <

1, then ˙

I < R S (0) > I ( t ) increases initially and reaches its maximum value when ˙ I = 0, or S = γ/β = 1 /R . Since Some authors use the letter R to denote “Recovered,” which presupposes that no one dies. It is more realistic touse the phrase “Removed” that includes both those who recover and those who die. + I + R = 1 at all times, it follows that when I reaches its maximum, the value of I + R equals1 − /R = ( R − /R , a number often referred to as the herd immunity level .While the above SIR model is a good starting point, a more realistic model consists of anintermediate group called E (for Exposed) in-between S and I . The equations for the SEIR model,which are studied in [18, 16] are as follows:˙ S = − βIS, ˙ E = βIS − γE, ˙ I = γE − δI, ˙ R = δI. (3)The above equations mean that when a person from group S comes into contact with a personfrom group I , then the former becomes “exposed” at a rate of β . Note that the transition is outof group S but to group E , and not to group I . The persons in group E become infected at a rate γ , and move to group I . Finally, people in group I move to group R at a rate of δ . Note that thetransition of people is strictly sequential in the order S → E → I → R . Note that there is no termof the form ES in the above equations. Therefore, contact between a susceptible person and anexposed person does not have any consequences. This is precisely the diﬀerence between previousdiseases to which the SEIR model has been applied, and COVID-19.Apparently the ﬁrst paper to identify asymptomatic patients as a separate category is [23].The model proposed there, which might be called the SAIR model, is as follows: The population isdivided into four groups, denoted as Susceptible ( S ), Asymptomatic ( A ), Infected ( I ), and Removed( R ). ˙ S = − β A AS − β I IS, ˙ A = β A AS + β I IS − γ A A − δA, ˙ I = δA − γ I I, ˙ R = γ A A + γ I I. (4)In contrast with the SEIR model of (3), in the SAIR model, interactions between susceptiblepersons ( S ) on one side, and either asymptomatic ( A ) or infected ( I ) persons on the other side,can lead to fresh infections, at rates of β A and β I respectively. The newly infected persons initiallyenter the asymptomatic group A . The asymptomatic persons in the A group move to the I groupand become symptomatic at a rate of δ , while others recover by moving to the group R at a rateof γ A . directly to the R group rate at another rate. Finally, symptomatic persons in the I groupget removed at a rate f γ I .Now we study the stability of the various models, namely SIR, SEIR, and SAIR. The ﬁrst twoare well-studied in the literature [18, 16]. However, the stability analysis of the SAIR model isinitiated in [23] and completed in [3]. The approach proposed in [3] is based on an extension ofthe well-known Krasovskii-LaSalle invariance theory for studying nonlinear diﬀerential equations,and results in very simple proofs. Moreover, the approach in [3] is applicable to quite generalproblems, and not just epidemiological models. To state the theorem concisely, we introduce theset of equilibria for each class of models. As shown in [11], the set E SIR := { S ∈ [0 , , I = 0 } is the set of equilibria for the SIR model of (2), while E SEIR := { S ∈ [0 , , E = 0 , I = 0 } is the set of equilibria for the SEIR model of (3). Similarly, the set E SAIR := { S ∈ [0 , , A = 0 , I = 0 }

3s the set of equilibria for the SAIR model of (4). It is shown in [3, Theorem 6] that, as t → ∞ , thetrajectories of each system approach the corresponding set of equilibria. However, the introductionof births and deaths, knows as “vital dynamics,” results in each system having only two equilibria.Vital dynamics are not discussed in this paper, but a thorough discussion can be found in [3]. Parameter Estimation in the Simpliﬁed SAIR Model

While the literature in epidemiology is quite rich in the formulation and the stability analysis ofvarious models, there is rather less discussion on estimating the parameters of the model, that is,inferring the values of the various parameters in the model on the basis of observations. That is thetopic of the present section. We begin by formulating a simpliﬁed version of the SAIR model thatis appicable to the COVID-19 pandemic. Then we present a method for estimating the parametersof this model. We conclude the section by showing that one of the parameters is very diﬃcult toestimate accurately. This observation provides the motivation for the SUTRA model, which is themain contribution of the paper and is presented in the next section.The SAIR model in (4) can be simpliﬁed by assuming that β A = β I = β, γ A = γ I = γ. (5)This leads to ˙ S = − βAS − βIS, ˙ A = βAS + βIS − γA − δA, ˙ I = δA − γI. (6)Thus there are only three parameters in this model, which might be called the simpliﬁed SAIRmodel, as opposed to ﬁve in the full SAIR model of (4). The justiﬁcation for these simplifyingassumptions is given next.1. The assumption that β A = β I = β means that the likelihood of fresh infection is the same,whether the contact is between A and S , or between I and S . After the onset of the COVID-19 pandemic, several papers in the literature have studied “viral shedding” by both asymp-tomatic and infected patients, and conclude that there is no discernible diﬀerence betweenthe two; see for example [30, 19, 17].2. The assumption that γ A = γ I = γ means that persons in both groups A and I move tothe “Removed” group R at the same rate γ . It is observed that almost all asymptomaticCOVID-19 patients recover within a span of about ten to twelve days. Thus one can take γ A to be in the interval [1 / , / R includes both those who recover as well as those who die, the timeconstant for removal from the I group is the same as for the A group.With these justiﬁcations, we now study the simpliﬁed SAIR model (6). Let us deﬁne M := A + I ,so that A = M − I . Note that M is the total number of infected persons, though it cannot bemeasured directly. Then (6) can be rewritten as˙ S = − βSM, ˙ M = βSM − γM, ˙ I = δM − ( γ + δ ) I. (7)Note that the ﬁrst two equations do not contain I . In fact these two equations represent just thestandard SIR model of (1) with M playing the role of I . The objective is to estimate these threeparameters β, γ, δ based only on data that can be measured. This consists of the daily totals of4ymptomatically infected patients I , and the subset of those who recover; this can be denoted as R I and satisﬁes ˙ R I = γI . It is also reasonable to assume that, when the pandemic starts, the initialconditions are S (0) = 1 − A (0) , I (0) = 0 . (8)In other words, the pandemic is seeded by a small number of asymptomatic patients, and thatthere are no symptomatic patients at the outset. Moreover, A (0) (cid:28) t is very small, we have that S ( t ) ≈

1. Therefore we can rewrite the secondequation in (7) as ˙ M ≈ ( β − γ ) M, or M ( t ) ≈ M (0) exp[( β − γ ) t ] . (9)Observe that, unless β > γ , the so-called basic reproduction ratio β/γ is less than one, and thepandemic does not take oﬀ [11]. Hence it can be assumed that β > γ . Next, one can substitutefrom (9) into the third equation in (8), as follows:˙ I = δM (0) exp[( β − γ ) t ] − ( γ + δ ) I, I (0) = 0 . (10)The solution of (10) is I ( t ) = δβ + δ [exp[( β − γ ) t ] − exp( − ( γ + δ ) t )] . (11)In turn (11) can be rewritten as I ( t ) = C exp[( β − γ ) t ] { − exp[( β + δ ) t ] } , (12)where C = δ/ ( β + δ ). Next, we can compute the logarithm of I ( t ) and note thatln I ( t ) = ln C + ( β − γ ) t + ln { − exp[( β + δ ) t ] } . (13)Of course these equations are only approximate.Based on these approximations, it is possible to estimate all three constants. First, note thatboth I and R I can be measured, and satisfy ˙ R I = γI . Therefore, for any ﬁxed time width T , wecan write R ( t + T ) − R ( t ) = γ (cid:90) t + Tt I ( s ) ds. (14)The parameter T is up to us to choose. In principle we could choose T = 1 and compute the dailyrecovery totals. However, this would not be very reliable, due to the vagaries in reporting recoverydata. Since there is usually weekly cyclicity in the reports, a good choice is T = 7 (days). One cangenerate the vector [ R ( t + T ) − R ( t )] for various values of t , and also I ( t ) for various t , and simplyﬁnd the best ﬁt for the slope; this gives γ . Next, it can be seen from (11) that I ( · ) is the sum ofa growing exponential and a decaying exponential. Therefore ln I ( t ) looks linear once the initialpart of the curve is ignored. By plotting ln I ( t ) as a function of t , it is possible to infer the valueof β − γ , and since we already know γ , we can in turn in turn infer β . Finally, once β is known, itis possible to compute the residual term ln { − exp[( β + δ ) t ] } , from which δ can be inferred.Figure 1 shows the plot of ln I ( t ) versus t for an exact solution of (7) (i.e, no approximations),with β = 0 . , γ = 0 . , δ = 0 . γ using (14) is not shown, as it is veryrobust, and the correct value of γ = 0 .

08 is recovered (corresponding to a mean recovery periodof 12.5 days). From the ﬁgure it can be seen that the graph does indeed follow a straight-linepattern, leading to the estimate ˆ β = 0 . Time -14-13-12-11-10-9-8 T r ueand F i tt ed l og I True vs. Fitted log I, beta = 0.2 gamma = 0.08 delta = 0.001

TrueFitted

Figure 1: Plot of ln I ( t ) versus t for the simpliﬁed SAIR model δ is 0 .

01, which is oﬀ by a factor of ten. This error is both inevitable and undesirable. First, theerror is inevitable because (13) gives an estimate of β + δ , and not δ alone, and β (cid:29) δ ; there isno way to overcome this. The error is also undesirable because it follows from (6) that when theinfected population is at its maximum, we have that ˙ I = 0 which implies that A = ( γ/δ ) I . Thus,when the infection peaks (which can be inferred from available measurements), the ratio betweenasymptomatic and symptomatic patients is γ/δ , and errors in estimating δ lead to poor estimatesof the number of asymptomatic patients. The SUTRA Model

While the SAIR model formulated in [23] is the ﬁrst one to make a clear distinction betweenasymptomatic and symptomatic patients, it does make one unrealistic assumption, namely: that all persons in group I are symptomatic. The logic in [23] is that persons with symptoms would presentthemselves to the health authorities, while asymptomatic persons would not. Over time, someasymptomatic patients would develop symptoms, at which time they too would present themselvesto the health authorities. However, this is not how matters have evolved during the COVID-19pandemic. Instead of the A and I groups, it more realistic to have groups U for Undetected butinfected, and T for Tested Positive. In most countries, once a person tests positive (i.e., infected)for the SARS-CoV-2 virus, contact tracing begins, whereby family members, and anyone else whomight have come into contact the person who tested positive are themselves tested. Some of thesetested persons would be found to be positive, while others would test negative. Those who testnegative need not concern us, as they belong to the Susceptible group S . However, among thosetest positive, which we call T , it is possible to make a further subgrouping into T A (tested positiveand asymptomatic) and T S (tested positive and symptomatic). In contrast, those in group U areinfected but asymptomatic, and thus are not detected. The point is that, due to contact tracing,some fraction (however small) of asymptomatic patients are also identiﬁed. To compare with theSAIR model, we can deﬁne A = U + T A , while I = T S . Moreover, it is believed that the group A greatly outnumbers I = T S . In [7], it is proposed that 75% of patients are asymptomatic, i.e.,that A/I ≈

3. However, experience in India indicates that this is a vast underestimate. Evenwithin the group T who test positive, about 80% to 85% turn out to be asymptomatic, so that6 A /T S ≈

5. Thus

A/I = ( U + T A ) /T S would be much higher. Indeed, estimating this ratio is oneof the contributions of the present paper.Let us now construct a model for the evolution of the pandemic. At present, in most countries,persons in the group T (whether symptomatic or not) are quarantined, and it can be assumed thatthey do not come into contact with the Susceptible population S . Therefore persons in group S getinfected only through contact with group U of undetected infected patients, with a likelihood of β .Finally, it is assumed that all infected persons are initially asymptomatic, and thus enter group U .In turn some part of U , call it x , moves to T , while the others move towards recovery. This leadsto ˙ S = − βSU, ˙ U = βSU − x − γU, where x is the as yet unspeciﬁed transfer rate, and γ is the rate of recovery. In turn the people inthe T group get removed at the same rate γ as those in the U group. It is easier to accept thatboth groups have same removal rates under the new interpretation than with SAIR model sincemost of T consists of T A which is same as U (the only diﬀerence being that asymptomatics in T A get detected). Thus we can write ˙ T = x − γT, ˙ R U = γU, ˙ R T = γT. Thus the model formulation is complete once we specify x , the transfer rate from group U to group T . One possibility is to assume that everyone from U migrates to T at a ﬁxed rate δ , so that x = δU . This would lead to ˙ U = βSU − δU − γU, ˙ T = δU − γT. This is the same as the simpliﬁed SAIR model of (6), with A and I replaced by U and T respectively.Thus the above model would suﬀer from the same diﬃculties in parameter estimation as that in(6). Thus an alternate approach is needed.In the above described process, it can be assumed that many of the people who tested positivecontracted the infection after the person originally found positive, and triggered the contact tracing.Also, most of the symptomatic cases show symptoms within a week of getting infected. Hence,the chances of a person in U getting detected are far higher for those who were infected in pastone week than those who were infected earlier. The fraction of infected cases in the past few dayscan be taken to be proportional to βSU , the fraction of persons who got infected most recently, asthe number of cases do not change signiﬁcantly over a window of few days. Therefore we choose x = (cid:15)βSU , with (cid:15) being another parameter of the model. With this assumption, the full SUTRAmodel becomes ˙ S = βSU, (15)˙ U = βSU − (cid:15)βSU − γU, ˙ T = (cid:15)βSU − γT, (16)˙ R U = γU, ˙ R T = γT. (17)The acronym SUTRA stands for Susceptible, Undetected, Tested (positive), and Removed (recov-ered or dead) Approach. Susceptible, The word Sutra also means an aphorism. Sutras are a genreof ancient and medieval Hindu texts, and depict a code strung together by a genre. It is possibleto introduce another parameter D denoting deaths, and write it as ˙ D = ηT . However, it is quiteeasy to estimate η as the ratio between the incremental death totals and the increase in cumulativepositive test cases, as in (14). Hence that relationship is not shown as a part of the SUTRA model.7 nalyzing Model Equations Deﬁning M = U + T , R = R U + R T , we get from equations (16) and (17) that˙ M + ˙ R = βSU = 1 (cid:15) ( ˙ T + ˙ R T ) , (18)resulting in M + R = 1 (cid:15) ( T + R T ) + c (19)for an appropriate constant of integration c . Adding equations (16) gives˙ M = βSU − γM = 1 (cid:15) ( ˙ T + γT ) − γM, or d ( M e γt ) dt = 1 (cid:15) d ( T e γt ) dt , (20)resulting in M = 1 (cid:15) T + de − γt (21)for some constant d . Since e − γt is a decaying exponential, it follows that, except for an initialtransient period, the relationship M = (1 /(cid:15) ) I holds. This in turn implies that U = M − T =((1 /(cid:15) ) − T . Note that 1 /γ is the expected recovery period for a patient. Hence, for COVID-19, γ ≈ . T + ˙ R T = (cid:15)βSU = β (1 − (cid:15) ) ST = β (1 − (cid:15) )(1 − ( M + R )) T = β (1 − (cid:15) )(1 − (cid:15) ( T + R T ) − c ) T = β (1 − (cid:15) )(1 − c ) T − β (1 − (cid:15) ) (cid:15) ( T + R T ) T (22)Eq. (22) is the fundamental equation governing the pandemic. It establishes a relationshipbetween ˙ T + ˙ R T , T , and ( T + R T ) T , which are all observable quantities. This equation allows us toestimate the parameters in the model, and to establish the correctness of the model with respectto COVID-19 by observing that the proposed relationship indeed holds. Parameter Estimation in the SUTRA Model

The progression of a pandemic is typically reported as three daily time series giving the totalnumber of detected infections, total number of recoveries, and the total number of deaths. Thesequantities equal P · ( T + R T ), P · R T and P · D respsectively, where P is the population withinthe reach of the pandemic. An added complication is that, when modeling the growth of pandemicduring its initial stages, the value of P keeps increasing as the pandemic spreads over a geographicalregion. We model it by writing P = ρP where P is the total population of the region under study,and ρ is a monotonically increasing parameter between 0 and 1, which we call “reach.”8etting T = ρP T , R T = ρP R T , and substituting in Eq. (22) gives T = 1 β (1 − (cid:15) )(1 − c ) ( ˙ T + ˙ R T ) + 1 (cid:15)ρ (1 − c ) T + R T P T = 1˜ β ( ˙ T + ˙ R T ) + 1˜ (cid:15) T + R T P T , (23)for ˜ β = β (1 − (cid:15) )(1 − c ) and ˜ (cid:15) = (cid:15)ρ (1 − c ) over a time-interval when ρ remains constant.We now describe methods for estimating the parameter values. The ﬁrst observation is that theparameter values change over time, either naturally ( ρ keeps increasing for example) or throughadministrative intervention (lockdowns reduce value of β , higher testing increases (cid:15) , and so on).It is reasonable to expect that the changes will either be sudden (lockdowns), or will be gradual(increase in reach). Hence we assume the parameters to be piecewise-constant. The entire timelineof the pandemic is divided into phases such that within each phase the parameters are constant.Further, at the beginning of each phase, the parameter values drift slowly for a brief period of timebefore settling down to their new values. Detecting Phase Boundaries

Equation (23) is impacted by all parameters of the pandemic except the death rate η , which canbe estimated separately from the rest. So we deﬁne a phase change as a time instant where theequation breaks down due to signiﬁcant errors. We start a new phase from that point. Note thatthis breakdown may continue for some time thereafter, since the parameters may take some timeto stabilize. We call this the drift period of the new phase. Estimating η and γ Parameters γ and η can be computed by using equations ˙ R T = γ T and ˙ D = η T , where D = ρP D .In order to eliminate errors in reported data, we integrate the equations over seven days and usemultiple time instants to calculate values of γ and η respectively that minimize the error. Theequations are analogous to (14). Estimating ˜ β and ˜ (cid:15) Equation (23) has ˜ β and ˜ (cid:15) as unknowns, so we can estimate their values by using a similar approachas above. However, estimating two parameters simultaneously becomes diﬃcult with relatively fewdata points (which happens when the duration of the phase is short), or the data has signiﬁcanterrors. In such situations we use a diﬀerent method for estimation that is more tolerant to errorsas described below.To reduce errors, we integrate equation (23) over seven days and take several time instants, say m , in a phase. From the input time series, we can extract the corresponding m values of T , ˙ T + ˙ R T ,and T ( T + R T ) /P . Let these be represented by m -dimensional vectors u , v , and w respectively.Then there are two possible methods for estimating ˜ β and ˜ (cid:15) , as described next. Standard Method

The standard way is to ﬁnd values for ˜ β and ˜ (cid:15) that maximize the R -value given by R = 1 − | u − β v − (cid:15) w | | u | , | · | denotes the Euclidean norm of a vector. When there are signiﬁcant errors in the data orthe duration of a phase is small, this method can give a non-physical solution whereby either ˜ β or˜ (cid:15) are negative! In such situations we use an alternate method. Alternate Method

Let R β = 1 − | u − β v − (cid:15) w | | u − (cid:15) w | R (cid:15) = 1 − | u − β v − (cid:15) w | | u − β v | Find values of ˜ β > (cid:15) > R β · R (cid:15) . This choice ensures that both ˜ β and ˜ (cid:15) play almost equally signiﬁcant roles in minimizing the error. Further, the desired maximumof R β R (cid:15) is guaranteed to exist since R β R (cid:15) = (2 v T u − (cid:15) v T w − β v T v )(2 w T u − β w T v − (cid:15) w T w )˜ β ˜ (cid:15) | u − (cid:15) w | | u − β v | (24)The denominator of equation (24) is always positive (under the requirement that ˜ β, ˜ (cid:15) > / ˜ β and 1 / ˜ (cid:15) . Therefore the valueof R β R (cid:15) is positive inside the polygon deﬁned by 1 / ˜ β ≥ / ˜ (cid:15) ≥

0, along with2 v T u − (cid:15) v T w − β v T v ≥ , w T u − β w T v − (cid:15) w T w ≥ , and is zero on the boundaries. This guarantees that there exists at least one maximum inside thepolygon. Estimating ρ and c Suppose all the parameters are known for the previous phase. Using these, we can compute thetime evolution of M = ρP M , R = ρP R , T , and R T up to the end of the phase. As shown above,we can also compute the values of η , γ , ˜ β , and ˜ (cid:15) for the current phase. Using the last two values,the evolution of T and R T for the current phase can be computed.Deﬁne a function f : [ − , × [0 , (cid:55)→ [ − , × [0 ,

1] as per the algorithm below. The ﬁrstcomponent is the range of possible values of the parameter c and second of the parameter ρ .Input: ( a, b ). Let c = a, ρ = b, (cid:15) = ˜ (cid:15)b (1 − a ) , β = ˜ β (1 − (cid:15) )(1 − a ) . Using these, compute the evolution of M and R for the current phase. Fit the valuesof M + R and T + R T for diﬀerent time instants on a line and let 1 /e and a (cid:48) be theslope and intercept respectively. Let b (cid:48) = ˜ (cid:15)e (1 − a (cid:48) ) and output ( a (cid:48) , b (cid:48) ).10n the above, ( a, b ) is the current guess for ( c, ρ ), and ( a (cid:48) , b (cid:48) ) the value of ( c, ρ ) obtained fromthe simulation done using the currently guessed value. Hence the correct value of ( c, ρ ) will be aﬁxed point of the map f : ( a, b ) (cid:55)→ ( a (cid:48) , b (cid:48) ) . In all our simulations we have found that:1. There is a unique ﬁxed point of f , and2. Staring from a random ( a, b ) and iterating f ﬁfteen times almost always converges to theﬁxed point value.For example, in simulations for India, all starting points converge. For Italy, except for phases 2–4,all points converge, and for phases 2–4, more than 90% of points converge. And for US, all pointsconverge for phases 3, 6, 7, 10, and for the rest more than 90% of points converge.Using the above algorithm, we can estimate values of all parameters for the current phase.The only exception is the ﬁrst phase when there is no previous phase, and as a result, there is nostarting point to compute evolution of M and R as required by the function f . The ﬁrst phasewould typically be at the start of pandemic and so the value of c can be taken to be zero. Thisleaves only the parameter (cid:15) to be estimated. The above analysis provides no method of estimatingthis parameter. Thus at least one external measurement is required to estimate (cid:15) in the ﬁrstphase. This is possible if sero-survey data is available at some point of time during the evolutionof the pandemic (not necessarily during the ﬁrst phase), as it can be shown that increasing initial (cid:15) decreases R for all time thereafter, except when the pandemic is nearing its end. Parameter values during drift period

The above calculations give us values of all parameters post the drift period of every phase. Forthe drift period, we take each parameter value to be a weighted average of its values for the postdrift periods of previous and current phases. Two of the six parameters of the model ( β and ρ ) aredetermined by the behavior of the population at large, while the remaining four can be controlledby administrative decisions. We have chosen to use the geometric mean for the ﬁrst two parameters,and the arithmetical mean for remaining four parameters.Speciﬁcally, suppose d is the number of days in drift period, and γ and γ are the computedvalues of parameter γ in the previous and the current phases. Then its value for the i th day in thedrift period of current phase is taken to be γ + id · ( γ − γ ) . On the other hand, the value of β for the same day is taken to be β · (cid:18) β β (cid:19) i/d f or ≤ i ≤ d. Herd Immunity

In classical epidemiology, the phrase “herd immunity” refers to the level of susceptible populationin the population at which the number of infections reaches its maximum. Interestingly, the phrase “herd immunity” [27] predates the introduction of the SIR model for a pandemic [15]. However, a11athematical formula for the onset of herd immunity was derived only much later [25, 9]. In thetraditional SIR model (1), the herd immunity level can be determined by setting ˙ I = 0, which gives βIS = γS = ⇒ S = βγ = 1 R . Thus, when herd immunity level is reached, the total number of infected and removed is given by I + R = 1 − S = R − R . In more elaborate models such as the SAIR model of (4), it is not easy to obtain an explicit formulafor the onset of herd immunity. However, in the case of the SUTRA model, it follows that the triplet(

S, M, R ) satisﬁes the SIR equations (1) with I replaced by M and β with β (1 − (cid:15) ). Therefore,once the parameters of the SUTRA model are estimated, it follows that the maximum values of M and T respectively are given by M max = R − R , I max = R − (cid:15)R , where R = β (1 − (cid:15) ) /γ .The conclusion therefore is that, so long as the model parameters are constant , the number ofinfections shows a unimodal behavior: It rises to a maximum and then decreases to zero. Furtherdetails can be found in [10, 11]. However, when the model parameters themselves change, thisobservation is no longer valid. Indeed, several countries have witnessed multiple peaks of thenumber of infections. Now we analyze how all the peaks of the infections can be determined.Every peak value of infections is characterized by0 = ˙ T = β (1 − (cid:15) ) ST − γT = β (1 − (cid:15) )(1 − c − (cid:15)ρP ( T + R T )) T − γT = ( ˜ β − γ − ˜ β ˜ (cid:15)P ( T + R T )) T Since T (cid:54) = 0, we get the condition: T + R T = ˜ (cid:15)P (cid:18) − γ ˜ β (cid:19) . (25)This gives a condition for T peaking in a given phase: T + R T < ˜ (cid:15)P (1 − γ/ ˜ β ) at the beginningof phase and T + R T > ˜ (cid:15)P (1 − γ/ ˜ β ) at the end of the phase. In order to predict whether therewould be another peak beyond the end of the current phase, we need to estimate the maximumpossible value of ˜ (cid:15) and minimum possible value of γ/ ˜ β . In our simulations, we have found γ/ ˜ β ≥ . T + R T > . (cid:15)P , then it can be predicted that there would be no peaks of T higher than thecurrent peak in future, or that herd immunity has been reached. However, this number is dependenton the value of (cid:15) which is a function of testing strategy adopted. Our simulations show (see tablesbelow) that value of (cid:15) tends to stablize after some time. So letting (cid:15) S denote the stable value of (cid:15) ,which also implies that c ≈ (cid:15) ≤ (cid:15) S , we can conclude that if T + R T (cid:29) . (cid:15) S P , then there will be no future peaks assuming that testing strategy does not change dramatically toincrease (cid:15) well beyond (cid:15) S . 12igure 2: Italy: Phase 1 Model Validation

Our model predicts an unexpected relationship between the observed values T and R T (see equa-tion (23). It essentially states that the values of T and R T follow the trajectory of an SIR modelwith contact parameter ˜ β and population ˜ (cid:15)P . If this relationship is observed in actual data, thenit would provide strong evidence that COVID-19 dynamics are well-approximated by the SUTRAmodel.It is straightforward to see that Equation (23) holds if and only if, for some value b >

0, thepoints (cid:18)(cid:90) tt − T ds − b (cid:90) tt − d ( T + R T ) , (cid:90) tt − ( T + R T ) T ds (cid:19) , lie on a straight line passing through the origin after an initial interval denoting the drift periodof a phase, and in addition, the slope of the line is 1 / ˜ (cid:15) . We use this criterion also to detect whena new phase has started, namely: when the points start drifting away from the straight line as t increases.In our analysis of twenty-ﬁve countries and twenty-eight states of India, we have found thepoints to be very close to a line: only rarely is the R -value of the ﬁt less than .

9. We demonstratethis phenomenon by using data from Italy – it is observed to have low noise and so the ﬁt is better( R -values are always more than 0 . Assessment of Strategies Adopted by Various Countries

The contents of the previous section provide strong evidence that SUTRA is the right model forCOVID-19. This is further validated by comparing trajectories for T computed through the modelwith the actual numbers for several countries in this section. Moreover, armed with the values ofvarious parameters for diﬀerent phases, particularly (cid:15) , β , and ρ , we can analyze the response ofa country to the pandemic and its impact. In this section, we analyze the strategies followed bycountries and how successful they were.There are two broad strategies for limiting the damage of the pandemic in the population:13igure 3: Italy: Phase 2Figure 4: Italy: Phase 3Figure 5: Italy: Phase 414igure 6: Italy: Phase 5Figure 7: Italy: Phase 6Figure 8: Italy: Phase 715 inimize Reach: Minimize the spread (through containment zones, isolation, extensive testingetc) so that few people get infected, and wait for the arrival of a vaccine. This correspondsto minimization of the parameter ρ . Maximize Reach:

Aim for herd immunity by letting the reach increase in a controlled manner,so that the medical infrastructure does not get overwhelmed. This is the strategy advocatedby the UK for example [29].It goes without saying that succesfully implementing either is very diﬃcult for any country. Bothstrategies have been followed by various countries, with some countries switching from one strategyto the other midway through the pandemic. We discuss the pros and cons of these strategies alongwith give examples of countries that implemented them and their success.As observed during discussion on estimation of parameters, at least one sero-survey data pointis needed to calibrate the model for a country. Such data points are available for many countries;see for example [4, 5]. In case where multiple data points are available for a country, we have usedthe earliest one. Serological surveys detect the presence of antibodies, levels of which may diminishwith time [28]. However all those who were infected with COVID-19 (either asymptotically or moreseverely) retain some form of immunity, for example through memory B cells and both CD4+ andCD8+ T-cell mediation, even after antibodies have reduced [24]. Thus we believe that sero-surveydata taken early during 630 the pandemic leads to more reliable estimates of 1 /(cid:15) .with the associated 95% conﬁdence intervals ,are presented in a tabuar form, with the initialestimate of 1 /(cid:15) (obtained through calibration) marked in red color.

Minimizing Reach

If the pandemic can be limited to a small section of population, the human and economic costsare minimized. However, there is always a risk of major outbreak and so the system needs to bealert and agile all the time. Signiﬁcant discipline from people is also required to keep the risk ofinfection low. Due to these reasons, this strategy has better chances of success in countries withsparse or homogeneous and disciplined population. Indeed, amongst the countries analyzed, thesuccessful ones satisfy this criteria. Of these, South Korea is most successful with reach limited to0 .

5% of population even in January 2021! Australia (1 . . . β γ η /(cid:15) ρ . ± .

01 0 . ± .

001 0 . ± . . . ±

02 18 Mar 5 0 . ± .

01 0 . ± .

003 0 . ± . . ± . . ±

03 01 Jun 5 0 . ± .

02 0 . ± .

001 0 . ± . ± . ±

04 26 Jun 10 0 . ± .

02 0 . ± .

002 0 . ± . . ± . . ±

05 11 Aug 10 0 . ± .

03 0 . ± .

003 0 . ± ± . ±

06 09 Sep 0 0 . ± .

01 0 . ± .

005 0 . ± . . ± . . ±

07 19 Oct 6 0 . ± . ± .

002 0 . ± . . ± . ±

08 12 Nov 3 0 . ± .

01 0 . ± .

001 0 . ± . . ± . . ±

09 08 Jan 0 0 . ± .

02 0 . ± .

005 0 . ± . ± . ± β γ η /(cid:15) ρ . ± .

01 0 . ± .

002 0 . ± . ±

02 11-04-2020 10 0 . ± . ± .

009 0 . ± . . ± . . ±

03 04-05-2020 0 0 . ± .

01 0 . ± .

004 0 . ± . . ± . . ±

04 16-06-2020 10 0 . ± .

01 0 . ± .

001 0 . ± . . ± . . ± . . ± .

01 0 . ± .

007 0 . ± . . ± . ±

06 26-09-2020 10 0 . ± .

03 0 . ± .

001 0 . ± . . ± . . ± . Maximizing Reach

Herd immunity is a laudable objective if it can be achieved, but the passage to it is fraught withdanger. Excessive control may limit the reach of the pandemic, thus eliminating the possibilityof achieving herd immunity. Therefore, calibrated control (through lockdowns, distancing etc.) isvery important. Since T + R T = ˜ (cid:15)P (1 − γ ˜ β ) ≈ (cid:15)ρP (1 − γβ ) , in order to avoid a rapid increase in total infections, one should initially reduce β (through lockdownsand personal protection measures), and then allow ρ to increase to close to its maximum value of 1.At this point, if the eﬀective reproduction ratio, deﬁned as the product SR , is substantially smallerthan one, then β can be permitted to increase slowly, through a relaxation of various lockdownmeasures.From publicly available data [31], it appears that India is perhaps the only major economythat has managed to get the strategy right. Other large countries have witnessed multiple peaksin their infection numbers. These multiple peaks can be explained by a phase-wise analysis of theparameters of the SUTRA model for each of these countries. However, relating the variations ofthese model parameters to policy decisions taken is outside the scope of the present paper.In the case of India however, it is easy to establish why the decisions taken have led to theavoidance of multiple peaks. Indeed, in our earlier paper [1], we have established that the impositionof the ﬁrst lockdown in March 2020 decreased the peak infection level by a factor of fourteen, andpushed the peak out in time by four months. In other words, “India ﬂattened the curve.” Moredetails are given next.The parameter β reduced signiﬁcantly due to lockdown and stayed low at ≈ . ρ had increased to ≈ .

9. Over the next ﬁve months, β slowly increased to itspresent value of ≈ .

18, while ρ has increased slightly to ≈ .

96. (Indeed ρ cannot increase beyondone.) There was only one peak in September, ignoring very minor peak in November. For India,as on 27th January, 0 . (cid:15) S P ≈ . T + R T ≈ . γ/ ˜ β seems to have stabilized to ≈ .

5, signiﬁcantly higher than 0 .

3, suggesting that threshold for last peak is lower ( ≈ . (cid:15) S P ≈ . ρ > > . (cid:15) S P ≈ T + R T ≈ . ρ > > . (cid:15) S P ≈ .

40 million, T + R T ≈ .

37 million). For UK, the recovery data isnot reported; therefore, we ﬁx γ = 0 . T + ˙ R T with actual numbers. For17able 3: India: Parameter TablePh No Start Drift β γ η /(cid:15) ρ . ± .

01 0 . ± .

001 0 . ± . . ± . . ± .

01 0 . ± .

001 0 . ± . . ± . . ± .

013 21-06-2020 30 0 . ± .

01 0 . ± .

002 0 . ± . ± . . ± . . ± .

01 0 . ± .

001 0 . ± . ± . . ± . . ± . ± .

001 0 . ± . ± . . ± . . ± .

01 0 . ± .

001 0 . ± . ± . . ± . . ± .

01 0 . ± .

002 0 . ± . ± . . ± . ≈ / β γ η /(cid:15) ρ . ± .

01 0 . . ± . . ± . . ± . . ± . . ± . . ± . . ± .

01 0 . . ± . . ± . . ± . . ± .

01 0 . . ± . ± . . ± . . ± .

01 0 . . ± . . ± . . ± . . ± .

02 0 . . ± . . ± . . ± . . ± .

03 0 . . ± . . ± . . ± . ρ is between 30% to 70%, indicating the possibilityof future peaks. We present two examples: Italy and US.For US, the recovery data is not reported regularly; therefore, as in the case of UK, we ﬁx γ = 0 . T + ˙ R T with actual numbers.Figure 9: South Korea: Detected Active Infections ( T )18able 5: Japan: Parameter TablePh No Start Drift β γ η /(cid:15) ρ . ± .

02 0 . ± .

003 0 . ± . . ±

02 07-03-2020 5 0 . ± .

01 0 . ± .

002 0 . ± . . ± . . ± . . ± .

01 0 . ± .

002 0 . ± . ± . ± . . ± .

02 0 . ± .

007 0 . ± . . ± . . ± . . ± .

17 0 . ± .

003 0 . ± . . ± . . ± . . ± .

01 0 . ± .

001 0 . ± . ± . . ± . . ± .

01 0 . ± .

001 0 . ± . . ± . ± . . ± . ± .

001 0 . ± . ± . . ± . . ± .

01 0 . ± .

005 0 . ± . ± . . ± . . ± . ± .

002 0 . ± . . ± . . ± . . ± .

04 0 . ± .

003 0 . ± . . ± . . ± . β γ η /(cid:15) ρ . ± .

01 0 . ± .

001 0 . ± . . ± . . ± .

01 0 . ± .

001 0 . ± . . ± . . ±

03 09-05-2020 3 0 . ± .

01 0 . ± .

001 0 . ± . . ± . ± . . ± .

02 0 . ± .

001 0 . ± . . ± . . ± . . ± . ± .

001 0 . ± . ± . . ± . . ± .

01 0 . ± .

001 0 . ± ± . . ± . . ± .

01 0 . ± .

001 0 . ± . ± . ± . Acknowledgements

The work of MA and MV was supported by the Science and Engineering Research Board, India.

References [1] Manindra Agrawal, Madhuri Kanitkar, and M. Vidyasagar. Modelling the spread of SARS-CoV-2 pandemic - impact of lockdowns & interventions.

Indian Journal of Medical Research ,2020.[2] R.M. Anderson and R.M. May.

Infectious Diseases of Humans: Dynamics and Control . OxfordUniversity Press, 1991.[3] Santosh Ansumali, Shaurya Kaushal, AlokeKumar, Meher K. Prakash, and M.Vidyasagar.Modelling a pandemic with asymptomatic patients, impact of lockdown and herd immunity,with applications to SARS-CoV-2.

Annual Reviews in Control , 50:432–447, 2020.[4] Rahul K. Arora, Abel Joseph, Jordan Van Wyk, et al. SeroTracker: a global SARS-CoV-2seroprevalence dashboard.

Lancet Infectious Diseases , pages 1–2, August 4 2020.[5] Rahul K. Arora, Abel Joseph, Jordan Van Wyk, et al. Serotracker. https://serotracker.com/en,January 2021. 19able 7: US: Parameter TablePh No Start Drift β γ η /(cid:15) ρ . ± .

02 0 . . ± . . . ± . . ± . . ± . . ± . . ± . . ± .

01 0 . . ± . . ± . ± . . ± .

01 0 . . ± . . ± . . ± . . ± . . ± . ± . ± . . ± .

01 0 . . ± . . ± . . ± . . ± .

01 0 . . ± . ± . . ± . . ± .

03 0 . . ± . ± . . ± . . ± .

01 0 . . ± . ± . . ± . . ± .

03 0 . . ± . ± . . ± . T )[6] F. Brauer, P. van den Driessche, and J. Wu (Eds.). Mathematical Epidemiology . Springer,2008.[7] Michael Day. Covid-19: identifying and isolating asymptomatic people helped eliminate virusin italian village.

The BMJ , 165:1–1, 23 March 2020.[8] O. Diekmann and J. A. P. Heesterbeek.

Mathematical epidemiology of infectious diseases .Wiley, 2000.[9] Klaus Dietz. Transmission and control of arbovirus diseases. In D. Ludwig and K. L. Cooke,editors,

Epidemiology , pages 104–121. Society for Industrial and Applied Mathematics (SIAM),1975.[10] Herbert W. Hethcote. Qualitative analyses of communicable disease models.

MathematicalBiosciences , 28(3-4):335–356, 1976.[11] Herbert W. Hethcote. The mathematics of infectious diseases.

SIAM Review , 42(4):399–453,2000.[12] Mark Honigsbaum. Revisiting the 1957 and 1968 inﬂuenza pandemics.

The Lancet ,395(10240):1824–1826, 13 June 2020. 20igure 11: India: Detected Active Infections ( T )Figure 12: UK: Detected New Infections ( ˙ T + ˙ R T )[13] J Hilton J and M J Keeling. Estimation of country-level basic reproductive ratios for novelcoronavirus (SARS-CoV-2/COVID-19) using synthetic contact matrices. PLoS Comput Biol ,16, 2020.[14] M. Keeling and P. Rohani.

Modelling Infectious Diseases in Humans and Animals . PrincetonUniversity Press, 2008.[15] William Ogilvy Kermack and A. G. McKendrick. A contribution to the mathematical theoryof epidemics.

Proceedings of The Royal Society A , 117(772):700–721, 1927.[16] Michael Y. Li and James S. Muldowney. Global stability for the SEIR model in epidemiology.

Mathematical Biology , 125:155–164, 1995.[17] Ruiyun Li, Sen Pei, Bin Chen, et al. Substantial undocumented infection facilitates the rapiddissemination of novel coronavirus (SARS-CoV-2).

Science , 368:489–493, 1 May 2020.[18] Wei-Min Liu, Herbert W. Hethcote, and Simon A. Levin. Dynamical behavior of epidemio-logical models with nonlinear incidence rates.

Journal of Mathematical Biology , 25:359–380,1987.[19] Yang Liu, Li-Meng Yan, Lagen Wan, et al. Viral dynamics in mild and severe cases of COVID-19.

The Lancet , 20(6):656–657, June 2020. 21igure 13: Japan: Detected Active Infections ( T )Figure 14: Italy: Detected Active Infections ( T )[20] M. Martcheva. An introduction to mathematical epidemiology , volume 61. Springer, 2015.[21] Gemma Massonis, Julio R. Banga, and Alejandro F. Villaverde. Structural identiﬁability andobservability of compartmental models of the covid-19 pandemic. arXiv:2006.14295

Mathematical Biosciences , 243(2):163–177, 2013.[24] Alessandro Sette and Shane Crotty. Adaptive immunity to SARS-CoV-2 and COVID-19.

Cell ,184:1–20, February 18 2021.[25] C. E. G. Smith. Prospects for the control of infectious disease.

Proceedings of the Royal Societyof Medicine , 63:1181–1190, 1970.[26] Statistica Portal. Cumulative number of people undergoing polymerase chain reaction(PCR) tests for coronavirus (COVID-19) in Japan as of January 8, 2021, by type of pa-22igure 15: US: Detected New Infections ( ˙ T + ˙ R T Journal of Hygiene , 21:243–249, 1923.[28] Nicolas Vabret, Graham J. Britton, Conor Gruber, others, and The Sinai Immunology ReviewProject. Immunology of COVID-19: Current State of the Science.

Immunity