[PDF] Epidemic parameters for COVID-19 in several regions of India

Abstract

Bayesian analysis of publicly available time series of cases and fatalities in different geographical regions of India during April 2020 is reported. It is found that the initial apparent rapid growthin infections could be partly due to confounding factors such as initial rapid ramp-up of disease surveillance. A brief discussion is given of the fallacies which arise if this possibility is neglected. The growth after April 10 is consistent with a time independent but region dependent exponential. From this, R0 is extracted using both known cases and fatalities. The two estimates are seen to agree in many cases; for these CFR is reported. It is seen that CFR and R0 increase together. Some public health implications of this observation are discussed, including a target doubling interval if medical facilities are to remain adequate.

Full PDF

aa r X i v : . [ q - b i o . P E ] M a y Epidemic parameters for COVID-19 in several regions of India

Sourendu Gupta

Department of Theoretical Physics,Tata Institute of Fundamental Research,Homi Bhabha Road, Mumbai 400005, India.

Bayesian analysis of publicly available time series of cases and fatalities in diﬀerent geographicalregions of India during April 2020 is reported. It is found that the initial apparent rapid growthin infections could be partly due to confounding factors such as initial rapid ramp-up of diseasesurveillance. A brief discussion is given of the fallacies which arise if this possibility is neglected.The growth after April 10 is consistent with a time independent but region dependent exponential.From this, R is extracted using both known cases and fatalities. The two estimates are seen toagree in many cases; for these CFR is reported. It is seen that CFR and R increase together. Somepublic health implications of this observation are discussed, including a target doubling interval ifmedical facilities are to remain adequate.TIFR/TH/20-16 I. INTRODUCTION

SARS-COV-2 is a virus which has newly entered the global human population [1]. As this host-parasite systemevolves towards an equilibrium, its epidemiology has been studied extensively, but with some conﬂicting results [2, 3].The true extent of its penetration into the population is as yet open to question [4], since testing is fairly restricted inmost countries. Nor is the progression of the disease, COVID-19, or its method of spread completely clear [5]. Sincethe virus is already so widely established, it seems unlikely that it will be totally eliminated soon. So it is importantto extract basic epidemiological parameters as cleanly as possible.India has managed to geographically contain the spread of the COVID-19 epidemic with the nation-wide lock-downwhich started on 24 March, 2020. At the end of April the proportion of identiﬁed cases in India as a whole was a fewtens per million, with 1–2 orders of magnitude more in hot spots. Even if this were wrong by an order of magnitude,it would still mean that the epidemic remains at an early stage in India. This, combined with the lock-down, presentsan opportunity to examine the growth of the epidemic in multiple isolated regions which implement essentially thesame policy with regard to testing. This study examines the heterogeneity in the growth rate of the disease, in severalways. First, the doubling intervals, τ , of the cumulative number of identiﬁed cases, C ( t ), and the cumulative numberof fatalities, D ( t ), is examined. From τ it is possible to extract the basic reproduction rate, R , within epidemicmodels. Marked heterogeneity are observed. After this the correlation between the case fatality ratio, CFR, and R is studied.Epidemic data, especially at the beginning is never clean. The public health system has to gear up for diseasesurveillance. The continuing recurrence of Cholera epidemics [6], the spread of Dengue and Chikungunya [7], thesuccessful surveillance and elimination of Nipah [8] and Zika [9], show that India has a mixed record on epidemicsurveillance. In addition to a possible lag between the beginning of the epidemic and its surveillance, there could bea problem of incomplete surveillance during the time the health service ramps up. Any examination of data has toallow for the identiﬁcation of confounding factors such as these.For the COVID-19 surveillance data, there are further cautionary remarks. The ICMR guidelines for testing [10]specify that only symptomatic cases should be tested using rRT-PCR. This part of the policy has been unchangedsince the middle of March. Depending on the fraction of cases which are symptomatic, this could miss the actualprevalence of the disease in the population. Estimates of the fraction of asymptomatic infections range as high as80% [11], implying that, in this extreme case, the testing policy can never reveal more than 20% of the cases. Thesocial stigma attached to COVID-19 [12] also means that some fraction of infections may be cryptic.There are uncertainties in the statistics of fatalities also. It has been reported that in Europe and the US thenumber of fatalities due to COVID-19 may have been underestimated by a factor of 2–3. Indian cities have fairlycomplete registries of deaths, so miscounting of COVID-19 fatalities could come mainly from mistaken or incompletereports of the cause of death. For larger regions, say districts and whole states, where most deaths happen at homeand death certiﬁcates are not common [13] the errors in counting fatalities may be signiﬁcantly larger, and hard toestimate at this time.One point about the quality test that is developed here is that absolute numbers are not as important for it asthe check that fatalities and identiﬁed cases are independently tracing the same rate of growth of the epidemic. Thisis expected at the beginning of the epidemic, when all epidemic models become linear, and the growth of genericmeasures is driven by the maximum eigenvalue of the linearized models. However, in the extraction of the CFR, theabsolute counts do matter. In spite of the uncertainties, the correlation of CFR and R holds important lessons forpublic health in the inevitable later stages of this epidemic in India, and the middle and low income countries of theworld. II. METHODA. Data

Data has been extracted from oﬃcial sources where possible. For Ahmedabad city, the data is made publiclyavailable by the municipal council of the city [14]. This well-organized site corrects data retrospectively for up toabout 10 days. For Chennai city, the data has been collected from daily tweets by the municipal council [15]. ForIndore city, the data was collected from daily bulletins of the Chief Medical and Health Oﬃcer and the collection isavailable for public use [16]. For Mumbai city the data has been collated from the daily tweets by MCGM [17] intoa publicly available site [18]. For Pune district the data was collated from the daily tweets by the district authority[19] and the collection is publicly available [20]. For Delhi and all other states, the data was taken from the publiclyavailable collection at Covid19India [21]. This site corrects data retrospectively for over a week. Only data on thecumulative number of identiﬁed cases, C ( t ), and the cumulative number of known fatalities, D ( t ), are used in thisanalysis. For this work data collection stopped on May 1, 2020, and retrospective corrections made after this are notincluded.The unquantiﬁable parts of the errors in the counts of cases and fatalities due to COVID-19 were discussed in theprevious section, along with the reasons why their estimates need not be included in this analysis. However, thereis another part of the errors in the daily counts of cases and fatalities which come from backlogs of tests or hospitalrecords. These shuﬄe a fraction of the numbers from one day to another, and therefore cause errors in the daily counts.As long as the number of facilities keep pace with the growth of the epidemic, these errors remain proportional tothe number of cases and fatalities. Since the wait time for hospital beds for COVID-19 cases has remained roughlyconstant during the period of study, this argument is expected to hold. In view of this, of 20% of the reported valuesof C ( t ) and D ( t ) are assigned as errors. The speciﬁc fraction, 20%, was chosen to in order to cover the long rangeﬂuctuations visible in the time series (for example in those visible on days 3 and 13 of Figure 1). It has been seenthat oﬃcial reports and independent estimates of these number are generally within this range. B. The doubling interval and R At such an early stage in the infection, it is reasonable to assume exponential growth, i.e. , doubling every τ days.Within this assumption one can check how well the lock-down is working by letting the doubling interval become timedependent. The simplest function to try is a linear change in τ , i.e. , C ( t ) = C t/ ( τ + tτ ) , (1)and a similar set of three parameters for C ( t ). Note that τ has dimensions of time, whereas τ is dimensionless. Aﬁtting form with constant doubling interval was also used; this is denoted τ , dropping the subscript.The ﬁtting procedure follows the methods of [23], with gamma distributions used as prior probability distributionfunctions (PDFs) for τ and C . The additional parameter τ is allowed to take positive and negative values, by lettingthe prior PDF to be a Gaussian. For all these distributions, the widths are taken large enough that the posteriordistribution is insensitive to the choice of priors.The appendix contains details of the relation between a time varying doubling interval and time variation of thebasic reproductive rate R . This requires choosing a model of the epidemic. Using the SEIR model, and the medianinterval between the appearance of symptoms and the time of fatalities, t = 17 . R = 1 + t ln 2 τ , and R T = − t τ ln 2 τ . (2)When a constant τ is used, one can set τ = 0 in the above formulæ and write R and τ instead of R and τ .Exactly the same procedure is followed for ﬁts to the time series for D ( t ). Estimates of the median values ofthe parameters, along with interquartile ranges (IQR) and 95% credible intervals (CrI) are quoted for the doublingintervals as well as R . t C ( t ) t D ( t ) t C ( t ) t D ( t ) FIG. 1: The time series of C ( t ) and D ( t ) for Delhi shown in logarithmic plot; the red dots show the data. The panels atthe top show that both sets of data indicate a change of slope around day 11, i.e. , about 18 days after the beginning of thenational lock-down. This requires explanation. The blue lines in these panels show the most likely exponential models. Thelower panels show the most likely model with changing doubling interval (gray curves) and with a constant doubling intervalafter day 11 (black lines). C. The case fatality ratio

The analysis of the time series for C ( t ) and D ( t ) lead quite naturally to the case fatality ratio, CFR. This is deﬁnedas the ratio CFR = D ( t ) /C ( t ) . (3)If C ( t ) is underestimated, then CFR is overestimated, and conversely, when D ( t ) is underestimated, then CFR is alsounderestimated. This was regulated using a Bayesian estimator. Since the outcome is binomial, the prior PDF usedis a beta distribution P (CFR) = 1 B ( α, β ) CFR α − (1 − CFR) β − , (4)with α = 1 and β = 2. These choices make the posterior distribution insensitive to doubling or halving the values ofthe priors. The posterior distribution is of the same form with α = 1 + D ( t ) and β = 2 + C ( t ) − D ( t ), with t takento be the ﬁnal day of the analysis. Since C ( t ) and D ( t ) are both large, the following approximations for the median, µ , and standard deviation, σ , may be used: µ [CFR] ≈ α − / α + β − / , and σ [CFR] = αβ ( α + β ) ( α + β + 1) . (5) III. RESULTSA. On R The time series of C ( t ) and D ( t ) is shown for the example of Delhi in Figure 1. Of the regions that we analysed,most cities show an initial rapid growth followed by a tempered growth. The exceptions are Ahmedabad and Chennaiamong cities, and the states of Gujarat, Kerala, and West Bengal. Note that day one is taken to be March 31, 2020, Urb Dates τ τ R − R /T Ahmedabad 08/4–28/4 1.63 0.17 8.6 1.6Chennai 03/4–19/4 4.94 0.19 3.5 0.2Delhi 31/3–28/4 1.12 0.11 12.0 2.2Indore 31/3–28/4 1.06 0.13 12.6 2.9Mumbai 31/3–28/4 1.64 0.13 8.5 1.2Pune 04/4–28/4 0.58 0.12 22.3 8.8TABLE I: Parameters of the most likely model describing the growth of fatalities in several urban regions (cities, except forthe district of Pune). The most likely initial doubling interval τ and the linear coeﬃcient of change τ , along with estimatesof the initial R and its rate of change R . Note the extremely large values of R .Data set City Dates τ (days) R med IQR 95% CrI med IQR 95% CrICases Ahmedabad 11/04/20 4.32 4.24:4.42 4.11:4.55 3.86 3.81:3.92 3.71:4.01Chennai 10/04/20 7.52 6.87:8.25 5.92:10.2 2.70 2.55:2.85 2.27:3.13Delhi 11/04/20 11.7 10.7:13.0 9.00:17.0 2.10 1.99:2.21 1.80:2.42Indore 11/04/20 7.20 6.85:7.61 6.20:8.43 2.74 2.65:2.83 2.49:2.99Mumbai 10/04/20 7.35 7.00:7.72 6.45:8.47 2.70 2.62:2.78 2.46:2.94Pune 10/04/20 6.68 6.36:7.00 5.85:7.75 2.87 2.78:2.95 2.62:3.12Gujarat 11/04/20 5.30 5.06:5.55 4.68:6.09 3.35 3.24:3.46 3.05:3.65Kerala 31/03/20 34.5 29.7:41.0 24.0:65.0 1.39 1.34:1.45 1.24:1.55W.Bengal 11/04/20 6.91 6.52:7.42 5.83:8.51 2.82 2.71:2.93 2.49:3.15Fatalities Ahmedabad 11/04/20 4.85 4.56:5.15 4.15:5.78 3.59 3.44:3.74 3.15:4.03Chennai 10/04/20 8.80 8.11:9.52 7.00:11.5 2.38 2.29:2.48 2.12:2.65Delhi 11/04/20 12.4 11.4:13.6 9.78:16.5 2.03 1.94:2.12 1.78:2.28Indore 11/04/20 13.7 12.3:15.4 10.3:20.1 1.95 1.85:2.05 1.67:2.24Mumbai 10/04/20 9.60 8.89:10.4 7.78:12.2 2.32 2.22:2.42 2.04:2.61Pune 10/04/20 11.1 10.3:12.0 9.12:14.0 2.14 2.06:2.22 1.91:2.38Gujarat 11/04/20 5.11 4.89:5.37 4.50:5.89 3.44 3.32:3.55 3.12:3.66Kerala 31/03/20 24.4 22.0:27.5 18.0:36.0 1.81 1.78:1.84 1.73:1.89W.Bengal 11/04/20 8.01 7.28:8.73 6.30:10.5 2.62 2.48:2.75 2.22:3.01TABLE II: The stable doubling interval during the lock down, τ , and the inferred basic reproduction rate, R , for diﬀerentgeographical locations in India. Shown are the median value, the inter-quartile range (IQR), and the 95% credible interval(95% CrI). which is 7 days after the beginning of the national lock-down. Since t = 17 . D ( t ) could then show an initial exponential growth, tempered after day 11. The initial datafor fatalities in Delhi, Indore, Mumbai and Pune can indeed be described by an exponential. However, the doublinginterval in Pune turns out to be half of that in Mumbai, although the average population density of Mumbai is about6 times larger than the average in Pune city.The ansatz of eq. (1), i.e. , a linearly varying doubling interval, was also examined for urban regions. The results arecollected in Table I. In most locations the initial doubling interval seems to be between half and day and two days.When converted to R , one obtains extremely high values, far in excess of what has been quoted in the literature.Certainly R could vary from place to place, since it depends on infectivity of the virus as well as the social networksin each location, and the latter may change from one place to another. However, τ for Pune is one third that ofMumbai, when Mumbai has six times the average population density. The wrong dependence of the doubling time forfatalities on population density, together with the observation that C ( t ) shows a growth till the same date, supportsthe idea that there could be a more parsimonious explanation for this common period of growth. This is discussed inthe next section. At the moment, any statistical evidence for a gradual slowing of the growth rate of the epidemic ishidden due to some confounding factors. ahcnDLin mbpnKL WB GJ1.0 1.5 2.0 2.5 3.0 3.5 4.01.01.52.02.53.03.54.0 R ( Fatalities ) R ( T e s t P o s i t i v e s ) FIG. 2: The values of R obtained from fatalities and test positives are plotted against each other. The diagonal (thin black)line marks the locus of equal values obtained from the two series. The yellow dots mark the medians inferred from data, thethick blue bars the IQR, and the black lines the 95% CrI. The code along the bottom edge identiﬁes each state (DL for Delhi,KL for Kerala, WB for West Bengal) or city (ad for Ahmedabad, cn for Chennai, id for Indore, mb for Mumbai, and pn forPune). The tag is placed at the same R as the corresponding city. When two cities are at nearly overlapping, then the tagsare displaced vertically in the same order as the results. In view of this, the analysis was continued with a constant doubling interval, τ , applying it to the period after day10 or 11. For this part of the analysis data was from three states, namely Gujarat, Kerala, and West Bengal, was alsoused. From Figure 1, one sees that this simpler model provides as good a description of the data as the model of eq.(1). Furthermore, this yields more realistic values of R implies that during the lock-down each of these places hasseen a location dependent constant doubling interval. The values of τ , along with inferred values of R , are collectedin Table II. These are the primary results of this analysis.It was noted that the number of known cases, C ( t ), is deﬁnitely missing cases among those who have not beentested. This could include a possibly large, fraction of asymptomatic and non-critical or pre-symptomatic cases[24, 25]. However, India’s disease surveillance mechanism has concentrated on identifying critical cases and contacttracing, which could be a good tracer of the growth of epidemics. If this reasoning is correct, then, during the earlygrowth of the epidemic, one should be able to obtain reliable doubling intervals from the cumulative counts of testpositives [23]. The results of this analysis are also given in Table II. The two independent estimates of R agree wellenough that a closer look reveals interesting patterns.The scatter plot in Figure 2 of R obtained in two diﬀerent ways shows several interesting patterns. First, thereseem to be two groups of outbreaks. Most regions have R below 3. Among the regions that we studied, Ahmedabadand Gujarat were a separate group, which saw a faster epidemic growth, with R above 3. Finally there is Kerala,a diﬀerent outlier, whose doubling intervals are longer than t , and therefore with very low values of R . The caseof Kerala merits a separate remark. The cumulative number of fatalities reached 4 at the end of the period of study.With such low counts of fatalities the assumption of exponential growth cannot be well tested. The counts of totalinfections was larger, and supported the hypothesis of exponential growth over the period studied.Second, one sees that most estimates lie close to the diagonal line. If the data was perfect, and the epidemic grewsteadily, the estimates would lie exactly on this line. With this requirement we can separate the regions into twogroups. One consisting of Ahmedabad, Chennai, Delhi, Gujarat, and West Bengal are, within statistical uncertainties,on this line. The second group, with Indore, Kerala, Mumbai, and Pune, are not. This could indicate some issueswith the data.On the other hand, if the data is as good as the other regions, then the fact that they are oﬀ the diagonal lineshould be understood. Kerala, which is the only region which lies below the diagonal, is perhaps seeing a lowergrowth in new cases than fatalities, which could be indicative of a gradual slowing down of the epidemic. Due to thelag by t , fatalities would see the slowing down later. Conversely, the regions which lie above the diagonal (namelyIndore, Mumbai, and Pune, and, possibly, Chennai) could be seeing an increased growth in infections, not yet visiblein fatalities because of the same time lag. Whether these scenarios are true, or the data quality is not dependable,should be known to the health agencies now, and would become visible to the public later. B. On CFR

When there is a statistically signiﬁcant diﬀerence between the doubling interval determined by C ( t ) and D ( t ), thenthe ratio gives a time-dependent CFR. This is usually understood to be a transient phenomenon. In view of this, the ahGJcn WBDL1.5 2.0 2.5 3.0 3.5 4.00123456 R C F R ( % ) FIG. 3: The values of CFR on the last day of the analysis period, plotted against R obtained from fatalities. Errors on CFRare given by the standard deviation, and the error bars on R indicate the IQR. One of the trend lines joins the data for cities,the other for states. The tag along the bottom identiﬁes each state (GJ for Gujarat, WB for West Bengal) or city (DL forDelhi, ad for Ahmedabad, cn for Chennai). It is placed directly below the symbol for the corresponding region. analysis was restricted to Ahmedabad, Chennai, Delhi, Gujarat, Kerala, and West Bengal, i.e. , the regions which lieon the diagonal line of Figure 2, and therefore are seeing a steady growth of identiﬁed cases as well as fatalities. Thecase fatality ratios for these regions are plotted against the R inferred from D ( t ) in Figure 3.The most obvious trend is that for the group of three cities there is an overall trend towards smaller CFR withdecreasing R . This is also true of the two states. However, the CFR for states is displaced upwards from that forcities. Both trends have strong implications for the public health outlook and will be discussed further in the nextsection. IV. DISCUSSION

Counts of known cases and fatalities of COVID-19 from ﬁve cities (Ahmedabad, Chennai, Delhi, Indore, andMumbai), one district (Pune), and three states (Gujarat, Kerala and West Bengal) was investigated in this work. Intwo of the groups, there was one case each where the epidemic was not severe at the end of April (Chennai amongcities, and Kerala among states). The others were known hot spots. Kerala is special because the number of fatalitiesis too low for statistical tests to be meaningful. There are strong regional heterogeneity in the course of the epidemic,indicating the necessity of looking at its spread at extremely local scales in order to check and control it.

A. Understanding the rapid initial growth

The time series both of known cases and fatalities in four out of the six urban centers showed a rapid rise for about18 days after the lock-down. This was followed by a much slower growth. Since fatalities track cases with a delayof 17.8 days on the average, the early part of this data could track the growth in the time before the lock-down.However, it turns out that the data grows faster in less dense urban areas. Moreover, this hypothesis is not tenablefor the growth in the number of known cases.A possibility which resolves these diﬃculties is that this rapid rise of numbers in the early days tracks the rapidimprovement of disease surveillance rather than the epidemic. The fact that the positive cases in Kerala does notshow such a rapid initial growth is consistent with reports that the state activated disease surveillance after the ﬁrstinfections came from abroad [26]. This could also be true of Ahmedabad and Gujarat, two other centers which showno such initial increase, since the state had passed through the surveillance challenge of Zika virus in recent years [9].Due to this confounding factor, it is not possible to use the data until April 10 or 11 to make any statistically validmeasurement of the growth of the epidemic before the lock-down. Neglecting this leads to multiple fallacies, which Iremark on next.

B. Fallacies

The apparent slowing down of the growth in later stages may be falsely interpreted as a transition to polynomialgrowth. As shown in eq. (A5) and eq. (A7), this is equivalent to a time dependent doubling interval. It has beendiscussed in the previous subsection that this leads to highly unlikely properties of the COVID-19 epidemic.The same apparent slowing down of the growth rate in India has also been interpreted within the homogeneousSIR model with constant, time invariant, parameters [27]. In such a simple model the time dependence can only comefrom early evolution towards herd immunity. This gives rise to the unlikely conclusion that herd immunity will bereached for COVID-19 while 99% of the population remains susceptible.A misrepresentation of data also arises when “instantaneous doubling intervals” or similar measures of exponentialgrowth are constructed using C ( t ) for one day, or averaged over small windows of time [28]. This shows a spuriousgradual slowing of growth during the ﬁrst three weeks of the lock-down. In later weeks these estimates are also plaguedby spurious eﬀects which result when delayed reports are dumped into cumulative numbers on one day instead of beingassigned to correct past dates. These appear as evidence of local spurts or slumps in growth. Evidence of retroactivecorrections from [14] shows that delays of as much as ten days may occur. When these artifacts are averaged over amoving window, this gives the mistaken appearance of peaks and troughs, and may put erroneous pressure to changepolicies. C. Doubling time and R Due to the reasons discussed in the previous subsections, the period after April 10 or 11 constitutes the base datafor the main part of this analysis. As shown in Figure 1 a constant growth rate in each locality during the the lock-down models the data as well as a growth rate which changes linearly with time. This is also the most parsimonioushypothesis about the growth of the epidemic.The observed doubling interval, and the derived quantity R , fall into three groups (see Table II and Figure 2).Several geographical regions have R less than 3. Kerala has R ≃ . t , the intervalfrom the emergence of symptoms to death). Gujarat and Ahmedabad have R higher than 3. Since this is the growthrate during the lock-down, population density eﬀects are unlikely to be the major determinant of R . It would beworthwhile to consider the role of individuals with extremely large number of contacts in this context, or a signiﬁcanttail of the distribution with small number of contacts, but still above three.Five regions pass the following data quality test— the value of R obtained from the growth of fatalities and casesare equal. This does not mean that the number of cases is correctly counted. Rather it indicates that the eﬀort to ﬁndthe cases requiring critical care, and tracing their contacts has successfully resulted in tracking a constant fraction ofall infected persons. It may miss, for example, a large fraction of asymptomatic cases. D. Chances of fatality

For the ﬁve geographical regions which pass the quality test described in the previous section, a further study wasperformed. The dependence of the case fatality ratio, CFR ( i.e. , the ratio of the observed number of fatalities andcases) on R was investigated. Although the number of cases identiﬁed may be much smaller than the actual numberof cases, the chance that cases are identiﬁed in these ﬁve regions are expected to be similar, since the rate of testingis about the same. A positive correlation between R and CFR is observed.One possible reason for this is that with lower R the number of critical cases grows slower, giving medical prac-titioners time to ﬁgure out good practices which prevent critical care patients from progressing to fatality. Deeperstudies of this factor, comparing case data from diﬀerent regions, is called for in future. It is possible that this is oneof the most positive, and least discussed, outcomes of the lock-down.Another possibility may also be conjectured. Careful maintenance of social distancing, necessary to reduce R ,results in evolutionary pressure on the virus. Lock-down and similar methods force the virus to evolve in a directionwhich maximizes its ability to reproduce, which it can do if the disease becomes less critical or asymptomatic, andthe chances of fatality decrease. It would be interesting to compare diﬀerent regions across the globe for changes inthe serial time and CFR. E. Public health implications

At the observed rate of growth, and with the current rate of testing, more than 0.5% of the population in hot spotswill begin to test positive for infections in about a month. A constant rate of growth of infections means that thenumber of hospital beds will also grow at the same rate, for as long as the epidemic is growing. Even if the rate isslowed down heavily, as it is already in Delhi, Mumbai, and Chennai, the demand for hospital facilities will keep ongrowing, as long as the epidemic grows.This demand is already beginning to outstrip resources in the larger cities. The mean interval between the startof symptoms and discharge was estimated to be 24.7 days [22]. This means that unless the doubling interval is keptabove 35 days (= 24 . / ln 2), the demand on hospitals will keep rising. Of the places we studied, only Kerala hasbegun to approach this break-even point.CFR is currently small, partly because medical facilities have been able to cope with the rate of growth. If thenumber of cases exceeds the capacity of the medical system, cases which might have recovered will be harder to treat.Inevitably in such cases CFR will climb. It is useful to note that in Figure 3 the statewide ﬁgures for CFR are higherthan those for cities. This is a reﬂection of the relative paucity of medical services outside cities, and is a pointer towhat might happen when the number of infections rises beyond the sustainable capacity of hospitals. V. ACKNOWLEDGEMENTS

I thank Rahul Banerjee, Prahlad Harsha, D. Indumathi, and R. Shankar for sharing collated data on various cities.I thank Jayasree Subramanian for providing me with the reference [14].

Appendix A: Time varying R and doubling interval In this appendix the unit of time will be taken to be the inverse of the mean rate of fatality of the infected. In theseunits, R is the average number of new infections caused by an infected person. R depends on the infectivity of thevirus, as well as an average degree of the contact network. As a result, it may be aﬀect by public health policies, suchas a lock down. Say a policy measure has a time scale is T . Due to this, R may become time-dependent, and onemay write a Taylor series expansion R = R + R tT + 12 R (cid:18) tT (cid:19) + · · · (A1)One may introduce this into a typical epidemic model equation, to obtain dIdt = ( R − I, which gives log (cid:20) I ( t ) I (0) (cid:21) = t " R − R (cid:18) tT (cid:19) + 13! R (cid:18) tT (cid:19) + · · · . (A2)In this form of the equation time is measured in units of the case resolution time. This equation assumes that thefraction of susceptible persons is close to unity, and the fraction of persons in any other compartment is very small.As argued before, this is a reasonable assumption to make. The cumulative number of infections is then found byintegration. There is no closed form result for the general case. If only the linear term in the expansion of R isretained, then I ( t ) = I s πT R e − T ( R − / (2 R ) Erﬁ " T ( R −

1) + R t p T R . (A3)The function Erﬁ is deﬁned through the integralErﬁ( z ) = 2 √ π Z z dx e x (A4)It is possible to use an expansion for t ≪ T , which gives the form I ( t ) = I (0) 1 λ e λ t (cid:2) ǫ (1 − λ t + λ t ) + O ( ǫ ) (cid:3) (A5)where the notation λ = R −

1, and ǫ = R / ( λ T ) are introduced. The imaginary part vanishes exponentially. Thisis easy to match to the phenomenological form I ( t ) = I (0) 2 t/ ( τ + τ t/T ′ ) = I (0) 2 t/τ (cid:20) − (cid:18) ln 2 τ τ T ′ (cid:19) t + O (cid:18) T ′ (cid:19)(cid:21) , (A6)where an artiﬁcial expansion parameter T ′ is introduced. It is set to unity after expansion. Matching these twoexpansions is accurate only when λ t is large. Then R = 1 + ln 2 τ , and R T = − τ ln 2 τ . (A7)The phenomenological parametrization of eq. (A2) can be connected to the parameters of (non-autonomous) evolutionequations for the epidemic. Note that T and T ′ are both regularization scales, in the sense of a renormalization group,whose numerical value need not be speciﬁed.In order to change units of time to days, it is necessary to choose a model of the epidemic. If one uses the SEIRmodel, then the unit of time would be the median interval between the appearance of symptoms and the time offatality or recovery, whichever is earlier. This quantity, t = 17 . t + t , where t is the median pre-symptomatic period, t = 5 . R = 1 + t ln 2 τ , and R T = − t τ ln 2 τ . (A8)When a constant τ is used, one can set τ = 0 in the above formulæ and write R and τ instead of R and τ . [1] M.G. Boni, P. Lemey, X. Jiang, et al. , Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for theCOVID-19 pandemic , biorXiv preprint [https://doi.org/10.1101/2020.03.30.015008][2] J. Zhang, M. Litvinova, Y. Liang, et al. , Changes in contact patterns shape the dynamics of the COVID-19 outbreak inChina , Science, (2020) [https://doi.org/10.1126/science.abb8001][3] S. Lai, N. W. Ruktanonchai, L. Zhou, et al.Eﬀect of non-pharmaceutical interventions to contain COVID-19 in China ,Nature (2020) [https://doi.org/10.1038/s41586-020-2293-x][4] G. Vogel,

First antibody surveys draw ﬁre for quality, bias , Science, 368, 350–351 (2020) [DOI: 10.1126/sci-ence.368.6489.350][5] H. Harapan, N. Itoh, A. Yuﬁka, et al. , Coronavirus disease 2019 (COVID-19): A literature review , J. Inf. Pub. Health, 13,667–673, (2020) [https://doi.org/10.1016/j.jiph.2020.03.019][6] M. Ali, S. Sen Gupta, N. Arora, et al. , Identiﬁcation of burden hotspots and risk factors for cholera in India: An observa-tional study , PLoS ONE 12(8):e0183100 (2017) [https://doi.org/10.1371/journal.pone.0183100][7] WHO,

Chikunguya in India

Nipah Virus — India

Zika Virus Infection — India

Strategy for COVID19 testing in India

Global Covid-19 Case Fatality Rates

Addressing Social Stigma Associated with COVID-19 et al.Nationwide mortality studies to quantify causes of death: relevant lessons form India’sMillion Death Study , Health Aﬀairs 36 No. 11 (2017) [doi:10.1377/hlthaﬀ.2017.0635][14] Amdavad Municipal Corporation, COVID-19 website [http://covid19.ahmedabadcity.gov.in/Chart][15] Greater Chennai Corporation,

Oﬃcial Twitter page , [https://twitter.com/chennaicorp][16] R. Banerjee,

Collection of Press releases by Praveen Jadiya, Chief Medical and Health Oﬃcer, Indore [https://drive.google.com/ﬁle/d/1ZLYBpaGRi GgFCIH8CJ6XqAmszmZ3gDv/view][17] Municipal Commission of Greater Mumbai, Health Department,

Oﬃcial handle ,[https://twitter.com/mybmchealthdept?lang=en][18] P. Harsha, data set available at https://docs.google.com/spreadsheets/d/1-OYukZzMlRcRKfMqh-pAle0JYAjLjUy08N9V6S51sq0/[19] District Information Oﬃce, Pune,

Oﬃcial Twitter account , [https://twitter.com/info Pune][20] D. Indumathi, data set available at https://docs.google.com/spreadsheets/d/1BDkE4S82dYSk1ZyvkKpk79a3dZy4HiOLuLp4eUZxBds/[21] Covid-19 India API, [https://api.covid19india.org/csv/][22] R. Verity, et al. , Estimates of the severity of Coronavirus disease 2019: a model based analysis , Lancet Infectious Diseases,(2020) [https://doi.org/10.1016/S1473-3099(20)30243-7] [23] S. Gupta, Inferring epidemic parameters for COVID-19 from fatality counts in Mumbai , preprint arXiv:2004.11677 (April,2020) [https://arxiv.org/abs/2004.11677][24] K. Mizumoto, K. Kagaya, A. Zarebski and G. Chowell,

Estimating the asymptomatic proportion of coronavirus disease2019 (COVID-19) cases on board the Diamond Princess cruise ship, Yokohama, Japan, 2020 , Euro Surveill. 2020 Mar 12;25(10): 2000180. [doi: 10.2807/1560-7917.ES.2020.25.10.2000180][25] H. Nishiura, T. Kobayashi, A. Suzuki, et al. , Estimation of the asymptomatic ratio of novel coronavirus infections (COVID-19) , Int J Infect Dis Published Online First: 13 March 2020. [doi:10.1016/j.ijid.2020.03.020][26] C. Maya,

Coronavirus: Surveillance is the key, Kerala shows the way

Predictive Monitoring of COVID-19 ,(Version of April 26) [https://ddi.sutd.edu.sg/][28] M. Roser, H. Ritchie, E. Ortiz-Ospina and J. Hasell,