[PDF] A scaling approach to estimate the COVID-19 infection fatality ratio from incomplete data

Abstract

SARS-CoV-2 has disrupted the life of billions of people around the world since the first outbreak was officially declared in China at the beginning of 2020. Yet, important questions such as how deadly it is or its degree of spread within different countries remain unanswered. In this work, we exploit the `universal' growth of the mortality rate with age observed in different countries since the beginning of their respective outbreaks, combined with the results of the antibody prevalence tests in the population of Spain, to unveil both unknowns. We validate these results with an analogous antibody rate survey in the canton of Geneva, Switzerland. We also argue that the official number of deaths over 70 years old is importantly underestimated in most of the countries, and we use the comparison between the official records with the number of deaths mentioning COVID-19 in the death certificates to quantify by how much. Using this information, we estimate the fatality infection ratio (IFR) for the different age segments and the fraction of the population infected in different countries assuming a uniform exposure to the virus in all age segments. We also give estimations for the non-uniform IFR using the sero-epidemiological results of Spain, showing a very similar growth of the fatality ratio with age. Only for Spain, we estimate the probability (if infected) of being identified as a case, being hospitalized or admitted in the intensive care units as function of age. In general, we observe a nearly exponential growth of the fatality ratio with age, which anticipates large differences in total IFR in countries with different demographic distributions, with numbers that range from 1.82\% in Italy, to 0.62\% in China or even 0.14\% in middle Africa.

Full PDF

AA SCALING APPROACH TO ESTIMATE THE

COVID-19

INFECTION FATALITY RATIO FROM INCOMPLETE DATA

Beatriz Seoane

Sorbonne Université, CNRS, IBPS,LCQB - UMR 7238, and ISCD,4 place Jussieu, 75005 Paris, France. [email protected]

June 5, 2020 A BSTRACT

SARS-CoV-2 has disrupted the life of billions of people around the world since the ﬁrst outbreak wasofﬁcially declared in China at the beginning of 2020. Yet, important questions such as how deadly itis or its degree of spread within different countries remain unanswered. In this work, we exploit the‘universal’ growth of the mortality rate with age observed in different countries since the beginning oftheir respective outbreaks, combined with the results of the antibody prevalence tests in the populationof Spain, to unveil both unknowns. We validate these results with an analogous antibody rate surveyin the canton of Geneva, Switzerland. We also argue that the ofﬁcial number of deaths over 70 yearsold is importantly underestimated in most of the countries, and we use the comparison between theofﬁcial records with the number of deaths mentioning COVID-19 in the death certiﬁcates to quantifyby how much. Using this information, we estimate the fatality infection ratio (IFR) for the differentage segments and the fraction of the population infected in different countries assuming a uniformexposure to the virus in all age segments. We also give estimations for the non-uniform IFR usingthe sero-epidemiological results of Spain, showing a very similar growth of the fatality ratio withage. Only for Spain, we estimate the probability (if infected) of being identiﬁed as a case, beinghospitalized or admitted in the intensive care units as function of age. In general, we observe a nearlyexponential growth of the fatality ratio with age, which anticipates large differences in total IFR incountries with different demographic distributions, with numbers that range from 1.82% in Italy, to0.62% in China or even 0.14% in middle Africa.

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has quickly spread around the world since itsﬁrst notice in December of 2019. The pandemic of the disease caused by this virus, the coronavirus disease 2019(COVID-19), at the moment of this writing, has claimed more than 380 thousand lives. Many countries in the worldhave declared different levels of population conﬁnement measures to try to minimize the number of new infectionsand to prevent the collapse of their respective health systems. As the ﬁrst wave of the outbreak starts to be controlled,the question of how to proceed next arises. The daily number of deaths is progressively decreasing in Europe, andwith it, the majority of the countries are starting to release the national lock-downs. The design of future strategieswill be sustained on the evolution of the ofﬁcial statistics, and the problem is that these statistics are very defective andincomplete. This is so because, on the one hand, the total number ofﬁcial cases is strongly limited by each country’sscreening capacity, which means that only a small fraction of the total infections is correctly identiﬁed (typically thosepresenting symptoms above a certain level of severity ﬁxed by each country’s policy). On the other hand, the shortageof screening tests and an overwhelmed health system also tend to underestimate the number of deaths in the ofﬁcialrecords. The actual degree of under-counting for both measures is unknown and most likely country dependent, whichresults in largely irreconcilable case fatality ratios all over the world. a r X i v : . [ q - b i o . P E ] J un PREPRINT - J

UNE

5, 2020Efforts have been made to determine the clinical severity of the virus [1, 2, 3], but determining precisely how deadlythis virus is remains hard [4]. Many different solutions using the available data have been proposed to extract the correctcase fatality ratio [5, 6], estimate the number of infections [7, 8] or the infection fatality ratio [9, 10, 11, 12]. Eventhe results of some early sero-epideomiological tests sampling the population degree of immunity have been stronglycontroversial [13, 14]. Probably the most reliable estimations for the infection fatality ratio (IFR, the probability ofdying once infected) as a function of the patient’s age, were proposed by Verity et al. in Ref. [10] using the data from4999 individual cases in mainland China and exported cases outside China. The ratios obtained were further validatedwith the reported cases in the Diamond Princess cruiser. Yet, these estimations were based on two assumptions. Firstly,a perfect detection of all the infections among people in their ﬁfties, a debatable hypothesis given how elusive thedetection of this virus is. And second, that the virus had spread uniformly within the population of all ages, which israther improbable in their case because they were analyzing mainly infections among travelers (that tend to be younger).Nevertheless, the picture is clear, the lethality of the virus increases sharply with the patients’ age, being particularlydeadly for elderly people and mild for kids.In the absence of a reliable number of conﬁrmed infections, most of the statistics have focused on the number of deaths,which are expected to be a fraction of the ﬁrst one. But deaths are much less common than infections, which means thatin order to estimate correctly the number of infections of a country, one needs a very accurate death counting. In thissense, it is widely accepted that the number of real deaths linked to COVID-19 is noticeably larger than what ofﬁcialsstatistics say [15, 16], but estimating precisely how much is hard and will likely depend strongly on the country datacollection policy and capacity. One can try to estimate the size of this discrepancy from the excess mortality observedsince the beginning of the pandemic in the public death records. This approach, though apparently infallible, is notwithout difﬁculties. Indeed, in most of the countries, the epidemic peak took place at the same time as that of lock-downmeasures, which means that, on the one hand, the mortality for other reasons (not COVID-19) has decreased, and on theother hand, the health system being under a lot of stress, the mortality linked to other diseases has increased. Correctingthese effects in the reference mortality trend requires a careful an exclusive analysis. We reason in an alternative way.The under-counting of deaths comes from mainly two sources: (i) only the deaths that can be directly linked toCOVID-19 (by means of a positive result in a PCR test, typically) are included in the ofﬁcial counting and (ii) countriesmostly count the deaths occurred within hospital facilities in the statistics. Source (i) tells us that all the patients thatdie before being tested are invisible. This will happen eventually at all ages but since old patients are more prone todevelop severe symptoms and have more difﬁculties to seek immediate medical attention, this situation will be farmore common among the elderly. Also source (ii) mainly affects old people because being hospitals crowded, theoldest patients have been often treated in retirement/care homes or in their own homes. For these reasons, we expect asigniﬁcantly more accurate reporting of the deaths of younger patients (in particular, under 70 years old). It is possibleto quantify this idea.According to the Ofﬁce of National Statistics in the United Kingdom, among deaths mentioning COVID-19 in thedeath certiﬁcate (in England and Wales by the 22nd of May) 64% took place in hospital, 29% in care houses and 5% athome [17]. Analogous data published by the Community of Madrid’s government (which counts more than 1/3 of theofﬁcial deaths in Spain) reports similar ratios: 61% hospitals, 32% socio-sanitary places and 6% home. France countsseparately the deaths occurring in hospitals and in care homes, and the latter being almost 60% of the former. Deathsoccurring in care houses are a large portion of the total in all countries, which means that an incomplete counting there,modiﬁes notably the overall statistics. However, once we look at the mortality per age group, such under-counting onlyaffects the patients of a certain age. In fact, we can compare the number of deaths having COVID-19 mentioned in thedeath certiﬁcate (even if it is only a suspicion, which most probably represents an over-counting of the real deaths) andthe ofﬁcial counting of deaths linked to COVID-19. In Fig. 1, we show the excess of the former with respect to thelatter, relatively to it (that is, suspected deaths divided by the ofﬁcial deaths minus 1) for England and Wales and theCommunity of Madrid. In both places, the under-counting is relatively age independent under 70-80 years old, andrather important above, specially for the patients above 90 years old, where real numbers may probably double theofﬁcial counting. Furthermore, this mismatch is getting worse as records in England and Wales are correctly updated(in Madrid it seems rather stabilized). Details on the data used to generate these plots are given in the Methods andDataset section.In summary, we expect a small mismatch between the real and the ofﬁcial number of deaths among patients under 70years old (the ∼

30% of under-counting is probably too large because deaths caused by other diseases are probablyalso included in this count), and a much higher systematic under-counting for the older segments. The actual numberswill depend on the country capacity to detect quickly the infections, but also on the particular details concerning thecounting of ofﬁcial deaths (which establishments are considered). We give these details, together with the last date usedfor each country in the Methods and Dataset section. 2

PREPRINT - J

UNE

5, 2020 -

39 40 -

59 60 -

79 80 + age group0.20.30.40.50.60.70.80.91.0 e x ce ss d ea t h s i n r e g i s t e r s / o ff i c i a l d ea t h s c ov i d E&W up to 2020-05-22E&W up to 2020-05-15E&W up to 2020-05-08E&W up to 2020-05-01E&W up to 2020-04-24E&W up to 2020-04-17 A -

49 50 -

59 60 -

69 70 -

79 80 -

89 90 + age group0.000.250.500.751.001.251.50 e x ce ss d ea t h s i n r e g i s t e r s / o ff i c i a l d ea t h s c ov i d Madrid data up to 2020-05-22Madrid data up to 2020-05-14 B Figure 1:

Under-counting of deaths per age groups. . We show the excess deaths, per age groups, observed whencomparing the number of deaths certiﬁcates where COVID-19 was mentioned either conﬁrmed or suspected, and theofﬁcial deaths attributed to COVID-19, relatively to this second number, for England and Wales in A , and for theCommunity of Madrid B .In this work, we attempt to estimate the IFR as function of age using scaling arguments relating the accumulated numberof deaths reported in different countries and age groups. We deﬁne all our variables in Section 2. We establish a directcorrespondence between the mortality rates in patients below 70 years old (where the ofﬁcial counting is more accurate)published in different countries around the world (but mostly in Europe) in Section 3.1. This good correspondenceallows us to make predictions about the degree of spread of the virus in different populations, or the global IFR ofa country, as compared to another one. We also observe that the collapse of the mortality rate with age in differentcountries is compatible with a pure exponential growth of the IFR with age (assuming a uniform attack rate). The scaleof total infections is then consistently ﬁxed from the rate of immunity obtained via blood tests of a statistical samplingof the citizens Spain in Section 3.2 (and compared to seroprevalence tests in Geneva, Switzerland, and New York City,United States). This scale allows us to compute the IFR as function of age and the number of current infections in eachcountry that are given in Table 2. In addition, we estimate the probability of being detected as ofﬁcial case, needinghospitalization and intensive care (if infected) as function of age in Spain in Section 3.3. All these rates are obtainedunder the assumption of a uniform attack rate, an assumption that seems fairly reasonable seeing the immunity measuresof the Spanish test, measures that, when once taken into account, do not change qualitatively the results discussed sofar (see in Section 4.1). Finally, we estimate the dimension of the under-counting of deaths among the elderly in thedifferent countries and give estimations for the overall lethality of the virus in Section 4.2. We relegate all the detailsconcerning the databases and dates used in the data-analysis for the Section 6. Statistical ofﬁces and health institutions of many countries have been publishing regularly the age distribution of theaccumulated number of deaths occurred in their territory since the beginning of the outbreak. We have combinednational data from Denmark, England& Wales, France, Germany, Italy, South Korea, Netherlands, Norway, Portugaland Spain, regional data from Geneva (Switzerland) and Madrid (Spain), and city data from New York City (UniteStates of America). Unless something else is mentioned, we will consider 10 age groups, each gathering together thedeaths of patients with ages in the same decade (with the exception of patients over 90 years old are grouped together).Since the different age segments are not uniformly populated, and this distribution can change signiﬁcantly from onecountry to another, we will discuss always the number of deaths normalized by the number density of people x α ( C ) ineach age group α and country C , that is, ˆ D α ( t ; C ) ≡ D α ( t ; C ) x α ( C ) , (1)being D α ( t ; C ) the accumulated number of deaths at a time t . This variable ˆ D α , normalized by the country’s populationand multiplied by 1000 (by convention) is the mortality rate per age group for the time elapsed since the beginningof the outbreak. In the following we omit the country variable C , unless explicitly needed. We show in Fig. 2–A theevolution of ˆ D α ( t ) in France for our ten age groups. As shown, once the effects of the demographic pyramid areremoved (the fact that there are much more people in their ﬁfties than in the nineties in any population, for example),the mortality expands over almost ﬁve orders of magnitude between kids and elderly people.Asymptotically, the accumulated number of deaths in each α at a given t , will be a ﬁxed fraction of the accumulatednumber of infected individuals in that group, I α , at a previous date t − ∆ , thus ∆ is an effective time related to the time3 PREPRINT - J

UNE

5, 2020 /

03 29 / /

04 08 /

04 15 /

04 22 /

04 29 / /

05 08 /

05 15 /

05 22 / date10 D ( t ) A - -

19 20 -

29 30 -

39 40 -

49 50 -

59 60 -

69 70 -

79 80 -

89 90 + age group10 D ( t ) / D ( t ) B Figure 2:

Normalized number of deaths occurred in French hospitals as a function of age. A

We show theevolution with time of the accumulated number of deaths normalized by the number density of individuals in age group α (i.e. ˆ D α ( t ) in Eq. (1)). In B , we show ˆ D α ( t ) / ˆ D − ( t ) as function of the age group, for all the times in A (thedarker the color, the more recent the measurement, and we give some dates in the legend). This quotient is essentiallytime-independent as discussed in Eq. (6), and it lets us estimate the quotient between the UIFR of the two age groups,that is, ˆ f α / ˆ f − .elapsed between infection and death (estimated to be, in average, around 20 days [18, 19]) . Then, D α ( t ) = f α I α ( t − ∆) + O (cid:16)(cid:112) f α I α (cid:17) , (2)being the proportionality factor, f α , the infection fatality ratio (IFR) for the age group. The ﬂuctuations O are theexpected error of an un-normalized histogram. The assignation of a unique delay for all the cases is, of course, an oversimpliﬁcation, but which yet works quite well as the number of infections becomes large. We show, for instance, theperfect match in time between the accumulated number of cases and deaths at a later time in Spain in Fig. S1.In general, we do not know either the total number of infections I = (cid:80) α I α , or the number of infections in a particularage segment I α , but we know the latter should be an (essentially constant) fraction of the total number of infections,plus ﬂuctuations, that is, I α ( t ) = r α x α I ( t ) + O (cid:16)(cid:112) r α x α I (cid:17) , (3)with r α being the relative risk of infection for group α (thus being r α I/N the standard attack rate for the group, with N the total country population). Clearly, (cid:80) α r α x α = 1 . Recent results analyzing the spread of the virus within closecontacts in the outbreak in China suggest a uniform exposure across the population [20], meaning that r α = 1 for allthe groups (quite different from the patterns observed for the seasonal ﬂu [21, 22]). There is, however, an importantdebate whether the low fatality observed in patients below 20 years old is related to a low risk of death or a low risk ofinfection. For the moment we keep this variable free and we will discuss it at the end of the paper. The risk of infection r α could, in principle, vary with time, but we do not observe a systematic change with time. This will be clearer withthe discussion around Fig. 2–B for the accumulated deaths, or for the analogous ﬁgure concerning the daily measures(which should be more sensitive to a change in r α ) in Supplemental Fig. S2. The combination of Eqs. (2) and (3) tells us that: ˆ D α ( t ) = ˆ f α I ( t − ∆) + O (cid:18)(cid:113) ˆ f α I/x α (cid:19) , (4)where ˆ f α = r α f α , (5) In general, ∆ depends on α , but we omit it here for simplicity because the differences are extremely subtle at this time of theoutbreak. PREPRINT - J

UNE

5, 2020would be the probability of dying with age α if the virus attacked uniformly all ages within the population. In otherwords, it is the “apparent" fatality (what we perceive from the daily news) without knowing if all ages have the samechances of getting infected. For this reason, we refer to ˆ f α as the uniform infection fatality rate (UIFR), as compared to f α , which is the real (potentially non-uniform) IFR associated to the disease. Both measures are only equal if r α = 1 for all α .All together, for all age segments, ˆ D α ( t ; c ) is expected to be proportional to the total number of infections at a previousdate, I ( t − ∆) . Alternatively, the quotient between the mortality rate of two distinct age groups, ˆ D α ( t )ˆ D β ( t ) = ˆ f α ˆ f β + O (cid:18)(cid:113) ˆ f α I/x α (cid:19) + O (cid:18)(cid:113) ˆ f β I/x β (cid:19) (6)is expected to become independent of time (as long as the number of the expected deaths for each group is large enough),and equal to the quotient between the UIFR of each group. This is precisely what we observe for the deaths occurred inFrench hospitals (see Fig. 2–B) where we show the quotient between each ˆ D α ( t ) , and the deaths among patients intheir ﬁfties, ˆ D − ( t ) for all daily reports since the 22nd of March of 2020 (the darker the color the more recent themeasurements). The other countries considered shows qualitatively the same behavior, we decided to show Francebecause it has been reporting age statistics (on a daily basis) for the entire number of deaths occurred up to that date.Thus, with this kind of analysis, even if we do not know the exact mortality associated to the virus, we can determinehow deadlier it is, at least apparently, for an age group as compared to another. We say apparent, because up to here, wecannot distinguish if the virus seems less aggressive for an age segment because the lethality is low or because so fewindividuals of that age got infected.The same kind of arguments apply to data from different countries at a ﬁxed time. Indeed, one expects that the IFR, f α , should not vary too much from country to country (at least within countries with comparable health systems).However, the relative attack risk r α may vary from country to country. Yet, if the differences in r α are not large betweenpopulations, then, also ˆ f α should be country independent. In such case, Eq. (4) tells us that the different ˆ D α ( C ) ,essentially differ by a multiplicative constant proportional to the total number of infections I ( C ) in each country. Weshow in Fig. 3–A, the counting ˆ D α by the 22nd of May of 2020 available for the different countries where we foundinformation about the proﬁle of deaths by decades of age (see Database section for details) as a function of α . Somecountries publish this data only for a fraction of all the deaths, if this is the case we renormalized the numbers with totalnumber of deaths reported by each country by the 22nd of May of 2020.As argued, the different countries’ curves are essentially parallel in logarithmic scale, with the exception of theNetherlands, where the mortality grows even more steeply with age than in the rest of countries (maybe related to asigniﬁcantly different r α ). In other words, we can extract both the number of total infections and the UIFR by age (butfor a multiplicative constant common to all the countries, or all the ages, respectively) from the collapse of these curves.We show in Fig. 3–B this collapse (where Netherlands was excluded), which works extremely well for all the countriesin the age region between 30-69 years old (despite the different orders of magnitude of ˆ D α ( C ) ). Deaths below 30 arevery rare, so strong ﬂuctuations between countries are expected. The collapse is less satisfying above 70 years old, but,as discussed, we believe this is mostly related to a different degree of under-counting of deaths for these segments ofage (though differences related to a relative under-representation of elderly people among the infected in some countriesare also possible). We believe it is mostly under-reporting because, for instance, the French curve would quickly matchthe rest of the countries if one added (for the segment over 80 years old) the ofﬁcial deaths occurring in care houses tothe hospital deaths shown here. We will try to estimate the extent of this under-reporting in each country below.One can now exploit this similarity in the growth of mortality with age between countries to remove the statisticalﬂuctuations. Thus, the country average of this collapse gives us the UIFR (but for an unknown proportionality constant ˆ f common to all age segments). We give the values of this average in Table 1 (errors are obtained using the boostrapmethod at 95% of conﬁdence). Data obtained is compatible with an exponential growth of the mortality rate with age(as shown in Fig. 3–C. In fact, a ﬁt of the data to ˆ f α ∝ exp ( A × age α ) (7)with A = 0 . has a χ / d . o . f = 3 . / . This strong dependence of the fatality with age anticipates that one shouldexpect that the global UIFR ( (cid:80) α x α ˆ f α ) varies a lot from one country to another due to the different demographicdistributions. We will discuss this below.Furthermore, the collapsing constant is essentially the relative number of infected people with respect to our referencecountry, that is I ( C ) /I (Spain) . This is not entirely true due to the different country policies concerning the death-counting, but, as discussed, we estimate that the unreported fraction under 70 years old is inferior to 30% and thequotient of the underestimation of the two countries would, in general, much smaller. We show these collapsingconstants in Table 1. 5 PREPRINT - J

UNE

5, 2020 - -

19 20 -

29 30 -

39 40 -

49 50 -

59 60 -

69 70 -

79 80 -

89 90 + age group10 D () SpainPortugalNorwayNetherlandsKoreaItalyGermanyFranceEnglandDenmark A - -

19 20 -

29 30 -

39 40 -

49 50 -

59 60 -

69 70 -

79 80 -

89 90 + age group10 D () / () average - - - - - - - - - + exp. fit B C

Figure 3:

Normalized number of deaths in different countries as a function of age. A

We show the normalizednumber of deaths per age group for a selection of countries affected by the COVID-19 epidemic in very differentdegrees. In B same data (excluding Netherlands) but where every country has been multiplied by a constant D ( C ) sothat it collapses with the Spanish curve in the age region in between 30-69 years old. The values of these constants aregiven in Table 1. In black, we show the country average for each age segment, and in C the ﬁt of this average to a pureexponential. Scaling relationsage group ∝ ˆ f α Country D ( C ) = I C /I (Spain) × Norway 0.0076(4)30-39 60(10) × Korea 0.014(2)40-49 18(3) × Italy 1.1(2)50-59 63(8) × Germany 0.27(2)60-69 21(1.6) × France 0.96(8)70-79 69(7) × England 1.48(2)80-89 22(3) × Denmark 0.02(-)90+ 5.6(16) × Table 1:

Collapse of the mortality rate in different countries.

We give the values extracted from the collapse ofFig. 3–B: the growth of the mortality with age (proportional to the uniform fatality ratio ˆ f α ) and the number of infectionsin each country with respect to the number of infections in Spain I C /I Spain equal to the collapsing constant D ( C ) . Therelative scaling of the mortality above 70 years old is expected to be signiﬁcantly underestimated. The errors of D ( C ) are only the statistical errors extracted from the data collapse, they do not include the systematic error associated to thedifferent policies of death counting the different countries which would be much larger, we try to give a better estimatebelow. Up to this point, we have only obtained the number of infections by country with respect to the number of total infectionsin Spain, and something proportional to the UIFR by age. In both cases, the proportionality constants (though bothrelated) are unknown. In order to ﬁx the scale, we can look to the statistical studies of prevalence of antibodies againstSARS-Cov2 in different populations. In particular, we refer to the preliminary results of the sero-epidemiological studyof the Spanish population (inferred from 60983 participants) made public by the Spanish Health Ministry the 13thof May of 2020 [23], that estimates that only a 5.0% (95% interval of conﬁdence (IC): 4.7%-5.4%) of the Spanishpopulation had been infected (using blood tests drawn in between 27/04-11/05/2020). Also, as an independent controlof the scale, we use the results of an analogous seroprevalence survey of the residents of the Geneva, Switzerland (from1335 participants) [24]. 6

PREPRINT - J

UNE

5, 2020Uniform infection fatality rate % Population infectedage group with under-counting estimation without under-counting Country0-9 0.0012%(4) 0.00118%(0.00082-0.0016) Spain 5.0%(4)10-19 0.0021%(7) 0.00211%(0.0014-0.0028) Portugal 1.0%(4)20-29 0.009%(23) 0.00878%(0.0065-0.012) Norway 0.33%(12)30-39 0.024%(5) 0.0241%(0.019-0.032) Korea 0.06%(2)40-49 0.072%(18) 0.0722%(0.056-0.097) Italy 4.3%(16)50-59 0.26%(5) 0.256%(0.21-0.35) Germany 0.8%(3)60-69 0.84%(0.14) 0.839%(0.71-1.1) France 3.4%(12)70-79 2.8%(5) 3.47%(2.9-4.7) England 6%(2)80-89 8.9%(18) 12.7%(11.-17.) Denmark 0.9%(3)90+ 23.%(7) 42.1%(34.-57.)Table 2:

Estimations assuming a uniform attack rate.

We show our estimation for the uniform fatality rate before andafter quantifying the effects of the systematic under-counting of deaths. We estimate the percentage of the populationinfected in each country. Errors include the statistical error ( ± sigma , the standard deviation obtained through errorpropagation of the results in Table 2, and the uncertainty of the prevalence survey in Spain) and a systematic error of35% of possible under-counting of deaths).The sampled rate of immunity in the Spanish population allows us to ﬁx directly I (Spain) in Table 1 and with it,estimate the number of infections in each of the countries of Fig. 3 (see Table 2). The results obtained are lower, butcompatible, with the independent estimations by Phipps et al. [8] or Salje et al. for France [25]. As shown, the ratesof infection (for the entire country) are rather low, in particular compared to the 60-70% herd immunity threshold(even if it is lowered for other effects [26]). Yet, it is important to stress that the propagation of the virus has beenrather heterogeneous in the territory, being the contagion high in certain regions and insigniﬁcant in others. We takefor example France, where the age distribution of the COVID-19 deaths is available for all the departments. Usingalso the data up to the 22nd of May, we estimate that the percentage of the population infected has reached 12% in theIsland of France (the department of Paris), 7% in the Great East, 2.5% in Upper France, and it is 1% or less in the restof departments.Furthermore, the total number of infections lets us compute the UIFR as function of the age in Spain just by dividingour ˆ D α by this number, that is, using Eq. (2), ˆ f α (Spain) ∼ ˆ D α I (Spain) . (8)We show the values obtained in Fig. 4–A. Then, we can extract ˆ f from the comparison of ˆ f α (Spain) with the values ˆ f ˆ f α in Table 2, in the age regions where we believe that the counting of deaths is reliable (the region where thecollapse of Fig. 3–B was good). We use the group 50-59 to ﬁx this constant ( ˆ f = ˆ f − / ˆ f − ), which allowsus to reconstruct entirely our estimate for the averaged UIFR (we show these values in Fig. 4–A and Table 2). Thisdetermination of the UIFR is expected to underestimate the fatality ratio for the oldest segments of data, we will try tocorrect this bias in the next section. We also include this second estimation in Table 2).We can test the accuracy of the estimated IFRs by this method using another independent sero-epidemiological survey.In particular, we use the work by Stringhini et al. [24] that measures the degree of seroprevalence in the canton ofGeneva (Switzerland) from samples of 1335 participants. Up to the 24th of May of 2020, the canton’s authorities hadreported 277 deaths, all but one from patients above 50 years old. We can use the age distribution of these deaths andour estimation of the IFR in Table 2, to guess the fraction of the population that have been infected so far using Eq. (2).We show in Fig. 4–B, the quotient D α /x α ˆ f α N , being N the total population of the canton of Geneva. If our ˆ f α is,indeed a good estimation for the real IFR, this quotient should give us the fraction of the population infected in that agegroup, which was estimated to be very similar above 50 years old and equal to 3.7% (95% CI 0.99-6.0) and about 8.5%(95%CI 4.99-11.7) in between 20-49 years old [24]. As shown, our predictions are in very good agreement with thesurvey estimation (specially once the systematic under-counting of deaths in the estimation of the IFR is corrected, seeSection 4.2).The convergence of the results in Spain and Switzerland lends great conﬁdence to the ratio between deaths andinfections, but let us stress that these estimations might be only valid for similar health systems, and for hospitals nottoo overwhelmed during the worst moments of the epidemic peak. In fact, if we use the IFR of Table. 2 to estimatethe percentage of infections in New York City from the distribution of the deaths with age published by NYC Healthat different dates (we show the results per age in the Supplemental Fig. S3), we obtain predictions for the overall7 PREPRINT - J

UNE

5, 2020 - -

19 20 -

29 30 -

39 40 -

49 50 -

59 60 -

69 70 -

79 80 -

89 90 + age group10 f unreliableregion antibody test Spainestimation China\nVerity et al.(2020)CFR South KoreaAverage-country estimationUnder-counting corrected A -

29 30 -

39 40 -

49 50 -

59 60 -

69 70 -

79 80 -

89 90 + Age group2.55.07.510.012.515.017.5 S e r op r e v a l e n ce G e n e v e [ % ] B Figure 4:

Probabilities assuming a uniform attack rate. A

We use the measurements of the number of infectedin Spain to estimate the uniform fatality rate using Eq. (2) in both regions. We ﬁx the constant ˆ f in Table 1 usingthe estimation of the fatality in Spain for the age group 50-59 to infer the average UIFR. We show this estimationin red and in green we show our estimation of the UIFR after the under-counting of deaths in the old segments havebeen corrected. We compare these results with the estimation by Verity et al. [10] and the case fatality ratio (i.e. theprobability of dying for conﬁrmed COVID-19 cases, not the IFR) by age obtained in South Korea. In B , we use theIFR estimations from A¯ and Table 2, to predict the seroprevalence of anti-SARS-CoV-2 antibodies in the population ofGeneva, Switzerland, from the ofﬁcial distribution of deaths per age of a total of 277 deceases. The predicted fractionof infections is given in dots (in green, if we used the bare estimation of Eq. (8), in violet, if we include the correctionslinked to under-counting). In horizontal lines (and the 95% of conﬁdence interval in gray shadow), we show the actualvalues measured from the survey of Ref. [24] for patients of different age-groups.antibody prevalence that evolve in time from 27% (data from 15th of April), 48% (the 1st of May), 57% (the 15th ofMay), to 63% (the 2nd of June), which would indicate that herd immunity would had already been reached in the city.However, there are proofs that this is not the case. Indeed, the presence of antibodies within the NYC’s citizens wasrandomly sampled, at a certain point at the end of April (details have not been published), in the base of a survey of15000 people in all the New York State. The results announced by the Governor in a press conference the 2nd of Mayof 2020 reported that only a 19.9% of the tested had antibodies. If we move forward ∼

20 days in time to see thisreﬂected in the deaths [18, 19], we estimate 3 times more infections, which inevitably suggests that the infection fatalityrate have been much higher in New York City that what it was in Spain or in Geneva, unless there are issues in thesero-prevalence study, something hard to estimate because technical details of the survey have not been published so far(to our knowledge).We can also compare our IFR with previous estimations. Our numbers are smaller than the estimation by Verity etal. [10] for all the age segments except those that concern the elderly patients (though still compatible with theirconﬁdence interval for most of the age groups), and about three times smaller than the case fatality ratio (the probabilityof dying among the conﬁrmed cases) per age group measured in South Korea (where a massive number of screeningtests has been made). This difference could be explained, in both cases, from an under-estimation of the number of totalnumber of infections. On the one hand, the IFR in Ref. [10] was estimated from the case fatality rate, and the statisticalprevalence of antibodies among the travelers returning home from repatriation ﬂights (which represents a much lowersampling that the one considered in the Spanish survey). On the other hand, Korea has been very successful identifyingnew infections by tracking the social contacts of the infected, but it is very unlikely that they are able to trace all theinfections.Before ending this Section, we would to warn about the limitations of the current sero-epidemological surveys, whichwill probably affect our results (even though we would like to stress that the Spanish survey has been praised for itsrobustness [27]). In fact, extracting accurate results from them is challenging for different reasons. Firstly, because thestudy must be well designed to avoid undesirable bias in the recruitment of the participants. Secondly, because theprobability of detecting the antibodies change with time [28] (an effect that must be taken into account [29]). Thirdly,because available tests are not very accurate [30], which means that statistical adjustments must be included in theanalysis to avoid mistaking the antibody rate with the false positive rate [31]. And ﬁnally, because the spread of thevirus have been very heterogeneous in space (as we illustrated for France above), which means that very large samplesare necessary to get the correct picture of a country. 8

PREPRINT - J

UNE

5, 2020 - -

19 20 -

29 30 -

39 40 -

49 50 -

59 60 -

69 70 -

79 80 -

89 90 + age group10 R a t e s S p a i n f C f H f S f D A < - -

14 15 -

29 30 -

39 40 - age group10 R a t e s S p a i n B Figure 5:

Other probabilities assuming a uniform attack rate. In A we show the probability of being classiﬁed asofﬁcial case, ˆ f C , being hospitalized, ˆ f H , admitted in intensive care, ˆ f S , and dying, ˆ f D , in Spain, as function of the ageusing age segments of 10 years. B , we show the same data but were the kid’s information has been grouped by smallerage-segments, evidencing the severity of the cases in patients under 2 years old. A is generated using the data by theSpanish Health Ministry up to the 22nd of May and B with the data published by the RENAVE. Spain also gives age distributed data (for groups of patients with ages in the same decades) for the accumulated numberof cases, C α , new hospitalizations, H α , and new admissions in intensive care units, S α . Due to the shortage of screeningtests, for most of the age groups, the number of cases gives us a measure of the number of patients with symptoms severeenough to visit an emergency room. For the oldest groups, it might not be the case because care houses with conﬁrmedcases have been more systematically tested than the rest of the population. Then, we apply the same reasoning used tocompute the UIFR to these indicators, which allows us to estimate the probability of being included in each of the otherthree categories. Unlike the deaths, policies concerning who get tested, hospitalized and/or admitted in an intensive careunit probably depend strongly on the country, which means that these probabilities might not be directly extrapolated toother countries.Equation (4) reads for a general observable X ( X = C, H, S, or D ), ˆ X α ( t ) = ˆ f Xα I ( t − ∆ X ) + O (cid:18)(cid:113) ˆ f Xα I/x α (cid:19) , (9)which means that we can directly extract the probability of being included in the X category ˆ f X (again assuminga uniform attack rate) using the measure I (Spain) from the antibody prevalence study [23]. Note that knowing theprecise value of ∆ X is not crucial here because the propagation of the disease has been mostly interrupted in Spainduring the last month, and I ( t − ∆ X ) is roughly constant at this point. We show the estimations of these probabilitiesper age group in Fig. 5.We see that, between 20-80 years old, the probability of being conﬁrmed as a case does not depend too much on age,and it keeps ﬁxed around 1 every 10 infections. The probability is higher for older segments and much smaller forpeople below 20 years old. For the other indicators, we observe a strong dependence of the severity with age. Forthe intensive care unit admissions, however, above 70 years old, one sees clearly the effects of the policies regulatingthe access to intensive care with age, an access that becomes rare over 80 years old. A situation which certainlycontributes to increasing slightly the mortality rate for the oldest age groups. We show in Fig. 5–B, narrower age groupsconcerning the youngest patients. This second Figure tells us that the severity related to COVID-19 in children is ratherheterogeneous in age, being particularly dangerous for kids below 2 years old (an age segment for which the admissionsin intensive care are more common than for patients above 40 years old). Furthermore, these probabilities might beunderestimated by the uniform attack rate assumption, since one expects a signiﬁcantly lower exposure to the virus atthese low ages (we will see this conﬁrmed in the data shown in Fig. 6).9 PREPRINT - J

UNE

5, 2020 < - - - - - - - - - - - - - - - - - -

89 90 age group0.00.51.01.52.0 r A - -

19 20 -

29 30 -

39 40 -

49 50 -

59 60 -

69 70 -

79 80 -

89 90 + age group10 R a t e s S p a i n IFR f D SpainUIFR f D SpainCFR Spain B Figure 6:

Uniform versus non uniform IFR. A

We show the relative risk of infection for each age segment r α takenfrom the sero-epidemiological study of the Spanish population [23], using Eq. (3). While the youngest segments of thepopulation seem to be less hit by the virus, the distribution of the infections is rather similar to that of a uniform attackrate, indicated by the dashed line r α = 1 here. The 95% conﬁdence interval for r α is indicated by the red shadow. B We show the estimated uniform and nonuniform IFR for Spain and compare it with the case fatality ratio as a functionof age. The error for the non-uniform IFR is shown by a red shadow.

Our indicator for the fatality ratio ˆ f α (and the probabilities of presenting different degrees of acuteness) measure howmore probable is to die with an age rather than with another in a population, which is not necessarily the true IFR (thatis, the probability of dying once infected, our f α in (2)). The two observables are only equal if the contagion is uniformamong all age segments of the population (we recall that, in our deﬁnition, f α = ˆ f α /r α , and uniform attack rate implies r α = 1 ). In other words, with our approach we are not able to distinguish if the mortality is low in a particular agesegment because (i) the disease is mild at these ages (low f α ) or (ii) because this age segment is rarely infected (lowexposure, r α (cid:28) in Eq. (3)). Previous studies estimating the IFR per age group, for instance Ref. [10], assumed auniform spread of the virus, something that seems justiﬁed by contagion dynamics studies [20].The sero-epidemiological study [23], gives also some clues about this point, because it also estimates the attack rate fordifferent age groups. We can extract our r α from this attack rate (we recall that r α I/N , with N the country-population).We show the values we obtain in Fig. 6–A. The measures only report a signiﬁcantly lower spread among children(which might be related to the closure of the schools during the lock-down), but for the rest of the ages the distributionis not so far away from the uniform attack rate. In any case, no exponentially increasing attack rate with age is foundto balance the strong increase of the fatality with age. However, the much lower exposure of the kids to the virustells us that the probability estimation of Fig. 5–B might be underestimated in that age segment, something that couldchange the overall picture of the severity of COVID-19 in babies, that might be similar to that of the adults. The changeof tendency of the severity with age in the case of infants could related with the suspected connection between theCOVID-19 and Kawasaki diseases [32, 33, 34].We can nevertheless compute the real (non-uniform) IFR using these values for r α for the Spanish data, and compare itwith our previous estimation. We show the results in Fig. 6–B. As shown, both estimations are essentially compatiblefor all the age segments, which lends conﬁdence to our previous results. The real fatalities will slightly change once theeffect of the non-uniform attack rate is included, but we do not expect these non-uniform fatalities to change drasticallywith respect to the uniform estimations we gave above. As discussed above, we expect that the number of deaths associated to COVID-19 to be underestimated in the ofﬁcialstatistics, specially on what concerns to elderly people. Now we try to give an estimate of how much. The collapse ofFig. 3–B shows us that Norway reports a noticeably higher number of deaths in the age segment above 70 years oldwith respect to other age groups. We believe that their counting is more accurate than for the rest for two reasons. First,because the Norwegian authorities report the deaths (of patients tested positive for COVID-19) occurring everywhere:10

PREPRINT - J

UNE

5, 2020% of missing deaths % total IFRSpain 38.%(0-86) 1.6%(1.1-2.1)Portugal 9.1%(0-47) 1.3%(1.2-1.8)Norway 0%(0-33) 1.2%(1.2-1.6)Korea 16.%(0-57) 0.87%(0.70-1.2)Italy 61.%(0-120) 1.8%(0.98-2.4)Germany 32.%(0-78) 1.6%(1.1-2.1)France ∗ Country-dependent estimates.

We estimate the percentage of unreported number of deaths for each countrytogether with the expected fatality ratio once included these estimated missing deaths. In the parenthesis we include theexpected values if the current death counting was perfect (no missing deaths, left side of the parenthesis) and if heavyunder-counting was present, such as the one observed when comparing with number of deaths with COVID-19 in thedeath certiﬁcate (right side of the parenthesis). ∗ France numbers were computed using only the deaths occurring inhospital facilities, which means that a 58% of under-counting is already conﬁrmed with the counting of deaths occurringin care-houses. We cannot correct the minimum IFR because we do not have the age proﬁle of these deaths.hospitals (38%), caring and retirement houses (59%) and homes (2%). And second, because the country has beenmuch less affected than the rest (Norway has reported only 235 deaths so far), which means that they are much betterequipped to properly detect and treat infections. For this reason, we can use the Norwegian measures give a quantitativeestimate to our under-determination of the mortality among the elderly (70-79: 22%, 80-89: 40% and 90+: 86%). Weshow in Fig. 4–B, that this simple correction allows us to predict correctly the measured prevalence of antibodies withinthe population in the oldest age segments in the canton of Geneva (Switzerland) [24].Yet, from this comparison with the Norwegian data we can only argue in terms of the scaling of an age segment withrespect to the other, but not on the factor common to all age segments. For this, we can use the comparison betweenour estimation for the UIFR based on ofﬁcial COVID-19 and the one extracted using the counting of deaths havingCOVID-19 mentioned in the death certiﬁcate. The sero-prevalence study [23] estimated that a 11.3% of the populationof the Community of Madrid had been infected, so we can use this number of infected to estimate the fatality ofthe region. Such a fatality ratio has to be regarded as an upper limit of the real fatality ratio, because “suspicion ofCOVID-19" probably encompasses many other respiratory diseases. We show this fatality compared to our previousestimation, and the estimation after correcting the under-counting of the oldest segments (using the Norwegian deathdata) in Fig. S4. We observe that, ﬁrstly, the correction introduced for the elderly segments is in perfect agreementwith the scaling observed in the Madrid’s data, with attaches conﬁdence to this correction, and second, that Madridestimation is around 35% larger than our previous estimation for the other ages. This estimation gives us an upper limitof the real mortality, which means that it allows us to estimate the maximum error of the predictions given up to now(since the real mortality must be between that value and this over-estimated one). We show these estimations in Table 2and Fig. 4–A after taking these effects of under-counting into account.We can use these corrections to estimate the number of unreported deaths for each of the countries considered and thevalues of the UIFR per age to compute the global IFR of each country. We show this data in Table 3. Considering that alower diffusion of the virus among the elderly would result also in a lower apparent mortality in these groups, we givealso the expected total IFR if the actual counting were perfect (left side of the parenthesis), and if a constant 35% ofunder-counting was present in all the age groups (right-side of the parenthesis).

The values of Table 3 shows us that the global fatality of the disease depends strongly on the demographics pyramid ofeach country, which is the consequence of the nearly exponential dependence of the UIFR with age. In fact, we can usethe average values given in Table 2 to explore how the global IFR would change in different parts of the world justbecause of a different distribution of the number of citizens with age (that is, leaving aside the differences related to thedifferent health systems). While for Italy the IFR is expected to be 1.8%, the same age proﬁle predicts a 0.62% IFR inChina (extremely similar to the one estimated in Ref. [10]) or a 0.14% in middle Africa, which could explain, partially,why the outbreaks are signiﬁcantly less important there than in Europe (where the overall IFR would be 1.38%).11

PREPRINT - J

UNE

5, 2020

We have studied the scaling of the accumulated number of deaths related to COVID-19 with age in different countries.After normalizing these numbers by the fraction of people with that age over the entire population, we observe that thelethality of the disease grows (almost) exponentially with age, expanding over almost 5 orders of magnitude betweenthe 0-9 and 90+ age segments. In addition, we show that this scaling with age is essentially country independent forages under 70 years old. We estimate that the differences observed over this age are mostly related to different levels ofunder-counting of deaths of elderly people. The collapse of the mortality data allows us establish direct correspondencesbetween the accumulated number of infections occurred in each country since the beginning of the outbreak.At a second stage, we use the Spanish survey of the sero-prevalence anti-SARS-CoV-2 antibodies in the Spanishpopulation [23] to ﬁx the scale between the number of infections and the number of deaths, which allows us to estimatethe COVID-19 infection fatality ratio as function of age (under the assumption of uniform attack rate). We validatethese numbers with an analogous prevalence survey of the Genova canton [24]. We also show that, when applied tothe COVID-19 death proﬁle of New York City, our predictions are not compatible with the antibody rates estimatedby the New York State [35]. This observation suggests that either the real immunity rate is much higher (and reachedherd immunity levels) or the fatality ratio has been signiﬁcantly higher in New York City than in Spain or Geneva, adiscrepancy that might be related to a different health system or a collapse of the sanitary system during the worsemoments of the epidemics. The scale of the number of infections allows us to compute as well the probability (ifinfected) of being classiﬁed a case, hospitalized, admitted in intensive care units or dying in Spain. The results show aclear growth of all degrees of severity with age, with the notable exception of the infections in patients below 2 yearsold that lead to much more complications than for older young patients, a situation that could be aggravated by the lowexposure of this population to the virus during the lock-down measures.We further discuss the validity of the uniform attack rate hypothesis using the age distribution of the antibody rates inthe Spanish sero-epidemiological study, concluding that even if differences of exposure of the virus between ages areobserved, differences do not change qualitatively our estimations for the infection fatality ratio. However, the low attackrate measured among babies warns us that the fatality rate below 2 years old might be importantly underestimated.We use information concerning the number of death certiﬁcates where COVID-19 was referred as possible death causeto show that the under-counting of deaths is a problem that mostly concerns the deaths of old patients. We use thescaling of the mortality with age in Norway to estimate the real fatality ratio of the elderly age segments (in other words,reverse the under-counting). We then validate these estimations with the age proﬁle of deaths in the canton of Genevaand of the deaths certiﬁcates in the Community of Madrid.Finally, our analysis relies exclusively on public statics’ data and can easily be updated as more accurate informationis available (for instance regarding the attack rates in different countries or better estimations of the total number ofinfections). In addition, if consolidated, the probabilities and the approach explained here, can be easily used to estimatethe degree of penetration of the SARS-CoV-2 in different cities, regions, or countries, and to track the evolution ofthe pandemics. Furthermore, we only analyzed the changes of the total mortality with age, but the socio-economicalenvironment of the patients plays also an important role. This study could be generalized to include such variables.

The information about the distribution of the number deaths associated to COVID-19 with age in different countries istaken from the database prepared by the “Institut national d’études démographiques (Ined)” (France) freely available forscientiﬁc use at the website https:/dc-covid.site.ined.fr/fr/donnees/ . For the rest of epidemic’s measuresin Spain (cases, hospitalizations, entries to intensive care unit, we used the COVID-19 datadista database [36]. In bothcases, these databases collect together the ofﬁcial information published by each country’s health authorities. The dataused corresponds to the 21st and 22nd of May. Some countries do not give the age proﬁle of the total number of deaths,only for a sub-group of the total. If this is the case, we assume a uniform sampling of the ages in all the age segments,and we renormalize all the D α so that the sum of the deaths over all the age groups matches the total number of deathspublished by each country on the 22nd of May of 2020. For the distribution of COVID-19 deaths with age by departmentin France, we used the data furnished by Santé Publique France, in particular the “donnees-hospitalieres-classe-age”available at the Données hospitalières relatives à l’épidémie de COVID-19 website.The information about the COVID-19 deaths in the Canton of Geneva is taken from the “N. 5 - 18 au 24 mai 2020"report in the République et canton de Genève website. The information about the deaths in New York city is taken fromthe “Total Deaths" reports of NYC health website, 12 PREPRINT - J

UNE

5, 2020The data concerning deaths mentioning COVID-19 in the death certiﬁcate was taken from the "up to week ending the22nd of May" report in the ONS website (England and Wales) and the "Informe de situación 22 de mayo 2020" fromComunidad de Madrid website. The age distribution of the ofﬁcial data (to generate Fig. 1) is taken for (England only)from the Ined database (which is extracted from the daily report of the National Health Service that includes only deathstested positive for Covid-19 occurred in hospitals only). In order to account for the deaths in Wales, we multipliedthe English distribution by 1.05 (Wales deaths represent a 5% of the sum of the deaths of Wales and England in theONS report). In order to estimate the ofﬁcial age distribution of deaths in Madrid, we renormalized the national agedistribution of accumulated deaths by the ofﬁcial accumulated number of Madrid at the 14th and 22nd of May. This is areasonable approximation considering that almost a third of the total COVID-19 deaths in Spain occurred in Madrid.

For the demographics distribution, we used the data available at the Ined database which corresponds to the lastdistribution published by each country ofﬁcial statistics’ agencies, and the database from the ’World PopulationProspects’ of the United Nations https://population.un.org/wpp/Download/Standard/Population/ for thediscussions about demography distribution in other parts of the world. The demographics of the canton of Geneva wereextracted from Statistiques cantonales in the République et canton de Genève website. For the demographics of NewYork City we used the data published in the NYCdata website from 2016.

I would like to thank Aurélien Decelle, Luca Leuzzi, Enzo Marinari, Giorgio Parisi, Federico Ricci-Tersenghi, RiccardoSpezia and Francesco Zamponi for useful and interesting discussions, and to Elisabeth Agoritsas, Ada Altieri andMarco Baity-Jesi and David Yllanes for a critical and constructive read of the manuscript.I also thank the Ministerio de Economía, Industria y Competitividad (MINECO) (Spain) through Grant PGC2018-094684-B-C21 (also partly funded by the EU through the FEDER program), for partial ﬁnancial support.

References [1] Wei-jie Guan, Zheng-yi Ni, Yu Hu, Wen-hua Liang, Chun-quan Ou, Jian-xing He, Lei Liu, Hong Shan, Chun-liangLei, David SC Hui, et al. Clinical characteristics of coronavirus disease 2019 in china.

New England journal ofmedicine , 382(18):1708–1720, 2020.[2] Joseph T Wu, Kathy Leung, Mary Bushman, Nishant Kishore, Rene Niehus, Pablo M de Salazar, Benjamin JCowling, Marc Lipsitch, and Gabriel M Leung. Estimating clinical severity of covid-19 from the transmissiondynamics in wuhan, china.

Nature Medicine , 26(4):506–510, 2020.[3] Vital Surveillances. The epidemiological characteristics of an outbreak of 2019 novel coronavirus diseases(covid-19)—china, 2020.

China CDC Weekly , 2(8):113–122, 2020.[4] Shigui Ruan. Likelihood of survival of coronavirus disease 2019.

The Lancet Infectious Diseases , 2020.[5] Lucas Böttcher, Mingtao Xia, and Tom Chou. Why estimating population-based case fatality rates duringepidemics may be misleading. arXiv preprint arXiv:2003.12032 , 2020.[6] Anastasios Nikolas Angelopoulos, Reese Pathak, Rohit Varma, and Michael I Jordan. Identifying and correctingbias from time-and severity-dependent reporting rates in the estimation of the covid-19 case fatality rate. arXivpreprint arXiv:2003.08592 , 2020.[7] Seth Flaxman, Swapnil Mishra, Axel Gandy, H Juliette T Unwin, Helen Coupland, Thomas A Mellan, HarrisonZhu, Tresnia Berah, Jeffrey W Eaton, Pablo NP Guzman, et al. Estimating the number of infections and theimpact of non-pharmaceutical interventions on covid-19 in european countries: technical description update. arXiv preprint arXiv:2004.11342 , 2020.[8] Steven John Phipps, Rupert Quentin Grafton, and Tom Kompas. Estimating the true (population) infection rate forcovid-19: A backcasting approach with monte carlo methods. medRxiv , 2020.[9] Sung-mok Jung, Andrei R Akhmetzhanov, Katsuma Hayashi, Natalie M Linton, Yichi Yang, Baoyin Yuan, TetsuroKobayashi, Ryo Kinoshita, and Hiroshi Nishiura. Real-time estimation of the risk of death from novel coronavirus(covid-19) infection: inference using exported cases.

Journal of clinical medicine , 9(2):523, 2020.[10] Robert Verity, Lucy C Okell, Ilaria Dorigatti, Peter Winskill, Charles Whittaker, Natsuko Imai, Gina Cuomo-Dannenburg, Hayley Thompson, Patrick GT Walker, Han Fu, et al. Estimates of the severity of coronavirus disease2019: a model-based analysis.

The Lancet infectious diseases , 2020.13

PREPRINT - J

UNE

5, 2020[11] Masahiro Sonoo, Takamichi Kanbayashi, Takayoshi Shimohata, Masahito Kobayashi, and Hideyuki Hayashi.Estimation of the true infection rate and infection fatality rate of covid-19 in the whole population of each country. medRxiv , 2020.[12] Chirag Modi, Vanessa Boehm, Simone Ferraro, George Stein, and Uros Seljak. How deadly is covid-19? arigorous analysis of excess mortality and age-dependent fatality rates in italy. medRxiv , 2020.[13] Eran Bendavid, Bianca Mulaney, Neeraj Sood, Soleil Shah, Emilia Ling, Rebecca Bromley-Dulfano, Cara Lai,Zoe Weissberg, Rodrigo Saavedra, James Tedrow, et al. Covid-19 antibody seroprevalence in santa clara county,california.

MedRxiv , 2020.[14] Stephen T Bennett and Mark Steyvers. Estimating covid-19 antibody seroprevalence in santa clara county,california. a re-analysis of bendavid et al. medRxiv , 2020.[15] David A Leon, Vladimir M Shkolnikov, Liam Smeeth, Per Magnus, Markéta Pechholdová, and Christopher IJarvis. Covid-19: a need for real-time monitoring of weekly excess deaths.

The Lancet , 395(10234):e81, 2020.[16] The Economist. Tracking covid-19 excess deaths across countries. , 2020.[17] Ofﬁce for National Statistics (UK). Deaths registered weekly in england and wales, provisional. Download data,2020.[18] Xi He, Eric HY Lau, Peng Wu, Xilong Deng, Jian Wang, Xinxin Hao, Yiu Chung Lau, Jessica Y Wong, YujuanGuan, Xinghua Tan, et al. Temporal dynamics in viral shedding and transmissibility of covid-19.

Nature medicine ,26(5):672–675, 2020.[19] Natalie M Linton, Tetsuro Kobayashi, Yichi Yang, Katsuma Hayashi, Andrei R Akhmetzhanov, Sung-mok Jung,Baoyin Yuan, Ryo Kinoshita, and Hiroshi Nishiura. Incubation period and other epidemiological characteristicsof 2019 novel coronavirus infections with right truncation: a statistical analysis of publicly available case data.

Journal of clinical medicine , 9(2):538, 2020.[20] Qifang Bi, Yongsheng Wu, Shujiang Mei, Chenfei Ye, Xuan Zou, Zhen Zhang, Xiaojian Liu, Lan Wei, Shaun ATruelove, Tong Zhang, et al. Epidemiology and transmission of covid-19 in 391 cases and 1286 of their closecontacts in shenzhen, china: a retrospective cohort study.

The Lancet Infectious Diseases , 2020.[21] Neil M Ferguson, Derek AT Cummings, Simon Cauchemez, Christophe Fraser, Steven Riley, Aronrag Meeyai,Sopon Iamsirithaworn, and Donald S Burke. Strategies for containing an emerging inﬂuenza pandemic in southeastasia.

Nature , 437(7056):209–214, 2005.[22] Joël Mossong, Niel Hens, Mark Jit, Philippe Beutels, Kari Auranen, Rafael Mikolajczyk, Marco Massari, StefaniaSalmaso, Gianpaolo Scalia Tomba, Jacco Wallinga, et al. Social contacts and mixing patterns relevant to thespread of infectious diseases.

PLoS medicine , 5(3), 2008.[23] Ministerio de Sanidad (Spain). Estudio ene-covid19: Primera ronda estudio nacional de sero-epidemiología de lainfección por sars-cov-2 en españa. See report, 2020.[24] Silvia Stringhini, Ania Wisniak, Giovanni Piumatti, Andrew S Azman, Stephen A Lauer, Helene Baysson,David De Ridder, Dusan Petrovic, Stephanie Schrempft, Kailing Marcus, et al. Repeated seroprevalence ofanti-sars-cov-2 igg antibodies in a population-based sample from geneva, switzerland. medRxiv , 2020.[25] Henrik Salje, Cécile Tran Kiem, Noémie Lefrancq, Noémie Courtejoie, Paolo Bosetti, Juliette Paireau, AlessioAndronico, Nathanaël Hozé, Jehanne Richet, Claire-Lise Dubost, Yann Le Strat, Justin Lessler, Daniel Levy-Bruhl, Arnaud Fontanet, Lulla Opatowski, Pierre-Yves Boelle, and Simon Cauchemez. Estimating the burden ofsars-cov-2 in france.

Science , 2020.[26] M Gabriela M Gomes, Ricardo Aguas, Rodrigo M Corder, Jessica G King, Kate E Langwig, Caetano Souto-Maior,Jorge Carneiro, Marcelo U Ferreira, and Carlos Penha-Goncalves. Individual variation in susceptibility or exposureto sars-cov-2 lowers the herd immunity threshold. medRxiv , 2020.[27] Emma Yasinski. Researchers applaud spanish covid-19 serological survey.

The Scientist , 2020.[28] Nandini Sethuraman, Sundararaj Stanleyraj Jeremiah, and Akihide Ryo. Interpreting Diagnostic Tests forSARS-CoV-2.

JAMA , 05 2020.[29] Jason Rosado, Charlotte Cockram, Sarah Helene Merkling, Caroline Demeret, Annalisa Meola, Solen Kerneis,Benjamin Terrier, Samira Faﬁ-Kremer, Jerome de Seze, Marija Backovic, et al. Serological signatures of sars-cov-2infection: Implications for antibody-based diagnostics. medRxiv , 2020.[30] Jeffrey D Whitman, Joseph Hiatt, Cody T Mowery, Brian R Shy, Ruby Yu, Tori N Yamamoto, Ujjwal Rathore,Gregory M Goldgof, Caroline Whitty, Jonathan M Woo, et al. Test performance evaluation of sars-cov-2serological assays.

MedRxiv , 2020. 14

PREPRINT - J

UNE

5, 2020[31] Christopher Sempos and Lu Tian. Adjusting coronavirus prevalence estimates for laboratory test kit error. medRxiv ,2020.[32] Shelley Riphagen, Xabier Gomez, Carmen Gonzalez-Martinez, Nick Wilkinson, and Paraskevi Theocharis.Hyperinﬂammatory shock in children during covid-19 pandemic.

The Lancet , 395(10237):1607–1608, 2020.[33] Veena G Jones, Marcos Mills, Dominique Suarez, Catherine A Hogan, Debra Yeh, J Bradley Segal, Elizabeth LNguyen, Gabrielle R Barsh, Shiraz Maskatia, and Roshni Mathew. Covid-19 and kawasaki disease: novel virusand novel case.

Hospital Pediatrics , 10(6):537–540, 2020.[34] Ashraf S Harahsheh, Nagib Dahdah, Jane W Newburger, Michael A Portman, Maryam Piram, Robert Tulloh,Brian W McCrindle, Sarah D de Ferranti, Rolando Cimaz, Dongngan T Truong, et al. Missed or delayed diagnosisof kawasaki disease during the 2019 novel coronavirus disease (covid-19) pandemic.

The Journal of Pediatrics https://github.com/datadista/datasets/tree/master/ , 2020.15

PREPRINT - J

UNE

5, 2020 / /

03 15 /

03 29 / /

04 15 /

04 29 / /

05 15 / C ( t ) D ( t ) 10 C ( t ) D ( t

5) × 9

Figure S1:

Simple scaling relation linking the evolution of the accumulated number of deaths and the cases withtime.

We show the evolution with time of the accumulated total number of ofﬁcial COVID-19 cases and deaths inSpain. In the inset the deaths’ curve is displayed 5 days backwards in time and multiplied by 9, following very preciselythe cases’ evolution once it surpassed approximately the 100 cases. - -

19 20 -

29 30 -

39 40 -

49 50 -

59 60 -

69 70 -

79 80 -

89 90 + age group10 D ( t ) A - -

19 20 -

29 30 -

39 40 -

49 50 -

59 60 -

69 70 -

79 80 -

89 90 + age group10 D ( t ) / D ( t ) B Figure S2:

Daily normalized number of deaths registered in French hospitals as function of age and time. A

Weshow the daily measures of deaths for age group α (normalized by the population density at this group), ∆ ˆ D α ( t ) , fordifferent dates. The darker the color, the more recent the measure. In B we show the collapse of the data when wenormalize the data with the numbers of group 60-69 years old. Distinct date data collapse worse in a single curve thanin the case of the accumulated number of deaths in Fig. 2 because being the daily measures smaller, the ﬂuctuations aremuch larger, yest, we do not observe any systematic change of the attack risk r α with time.16 PREPRINT - J

UNE

5, 2020 - y ea r s - y ea r s - y ea r s - y ea r s a nd o l d e r y ea r s p r e d i c ti on p r e v a l e n ce NY C [ % ] Figure S3:

Predictions for the sero-prevalence in New York City.

We show our predictions for the sero-prevalencepresence in New York City using the death age proﬁle published at different dates and the IFR of Table 2 (withoutunder-counting corrections). Our predictions are signiﬁcantly higher than the results of the sero-epidemiological surveyannounced by the New York State Governor the 2nd of May of 2020. - -

19 20 -

29 30 -

39 40 -

49 50 -

59 60 -

69 70 -

79 80 -