[PDF] Using excess deaths and testing statistics to improve estimates of COVID-19 mortalities

Abstract

Factors such as non-uniform definitions of mortality, uncertainty in disease prevalence, and biased sampling complicate the quantification of fatality during an epidemic. Regardless of the employed fatality measure, the infected population and the number of infection-caused deaths need to be consistently estimated for comparing mortality across regions. We combine historical and current mortality data, a statistical testing model, and an SIR epidemic model, to improve estimation of mortality. We find that the average excess death across the entire US is 13\% higher than the number of reported COVID-19 deaths. In some areas, such as New York City, the number of weekly deaths is about eight times higher than in previous years. Other countries such as Peru, Ecuador, Mexico, and Spain exhibit excess deaths significantly higher than their reported COVID-19 deaths. Conversely, we find negligible or negative excess deaths for part and all of 2020 for Denmark, Germany, and Norway.

Full PDF

UUsing excess deaths and testing statistics to improve estimates of COVID-19mortalities

Lucas B¨ottcher, ∗ Maria R. D’Orsogna,

2, 1, † and Tom Chou

1, 3, ‡ Dept. of Computational Medicine, UCLA, Los Angeles, CA 90095-1766 Dept. of Mathematics, California State University at Northridge, Los Angeles, CA 91330-8313 Dept. of Mathematics, UCLA, Los Angeles, CA 90095-1555 (Dated: January 12, 2021)Factors such as non-uniform deﬁnitions of mortality, uncertainty in disease prevalence, and biasedsampling complicate the quantiﬁcation of fatality during an epidemic. Regardless of the employedfatality measure, the infected population and the number of infection-caused deaths need to beconsistently estimated for comparing mortality across regions. We combine historical and currentmortality data, a statistical testing model, and an SIR epidemic model, to improve estimation ofmortality. We ﬁnd that the average excess death across the entire US is 13% higher than the numberof reported COVID-19 deaths. In some areas, such as New York City, the number of weekly deaths isabout eight times higher than in previous years. Other countries such as Peru, Ecuador, Mexico, andSpain exhibit excess deaths signiﬁcantly higher than their reported COVID-19 deaths. Conversely,we ﬁnd negligible or negative excess deaths for part and all of 2020 for Denmark, Germany, andNorway.

Introduction

The novel severe acute respiratory syndrome coron-avirus 2 (SARS-CoV-2) ﬁrst identiﬁed in Wuhan, Chinain December 2019 quickly spread across the globe, lead-ing to the declaration of a pandemic on March 11,2020 [1]. The emerging disease was termed COVID-19.As of this January 2020 writing, more than 86 millionpeople have been infected, and more than 1.8 milliondeaths from COVID-19 in more than 218 countries [2]have been conﬁrmed. About 61 million people have re-covered globally.Properly estimating the severity of any infectious dis-ease is crucial for identifying near-future scenarios, anddesigning intervention strategies. This is especially truefor SARS-CoV-2 given the relative ease with which itspreads, due to long incubation periods, asymptomaticcarriers, and stealth transmissions [3]. Most measuresof severity are derived from the number of deaths, thenumber of conﬁrmed and unconﬁrmed infections, and thenumber of secondary cases generated by a single primaryinfection, to name a few. Measuring these quantities,determining how they evolve in a population, and howthey are to be compared across groups, and over time, ischallenging due to many confounding variables and un-certainties.For example, quantifying COVID-19 deaths across ju-risdictions must take into account the existence of diﬀer-ent protocols in assigning cause of death, cataloging co-morbidities [4], and lag time reporting [5]. Inconsisten-cies also arise in the way deaths are recorded, especiallywhen COVID-19 is not the direct cause of death, rather ∗ Electronic address: [email protected] † Electronic address: [email protected] ‡ Electronic address: [email protected] a co-factor leading to complications such as pneumoniaand other respiratory ailments [6]. In Italy, the clini-cian’s best judgment is called upon to classify the causeof death of an untested person who manifests COVID-19symptoms. In some cases, such persons are given post-mortem tests, and if results are positive, added to thestatistics. Criteria vary from region to region [7]. InGermany, postmortem testing is not routinely employed,possibly explaining the large diﬀerence in mortality be-tween the two countries. In the US, current guidelinesstate that if typical symptoms are observed, the patient’sdeath can be registered as due to COVID-19 even with-out a positive test [8]. Certain jurisdictions will list dateson which deaths actually occurred, others list dates onwhich they were reported, leading to potential lag-times.Other countries tally COVID-19 related deaths only ifthey occur in hospital settings, while others also includethose that occur in private and/or nursing homes.In addition to the diﬃculty in obtaining accurate anduniform fatality counts, estimating the prevalence of thedisease is also a challenging task. Large-scale testing ofa population where a fraction of individuals is infected,relies on unbiased sampling, reliable tests, and accuraterecording of results. One of the main sources of system-atic bias arises from the tested subpopulation: due toshortages in testing resources, or in response to publichealth guidelines, COVID-19 tests have more often beenconducted on symptomatic persons, the elderly, front-line workers and/or those returning from hot-spots. Suchnon-random testing overestimates the infected fraction ofthe population.Diﬀerent types of tests also probe diﬀerent infectedsubpopulations. Tests based on reverse-transcriptionpolymerase chain reaction (RT-PCR), whereby viral ge-netic material is detected primarily in the upper respira-tory tract and ampliﬁed, probe individuals who are ac-tively infected. Serological tests (such as enzyme-linkedimmunosorbent assay, ELISA) detect antiviral antibod- a r X i v : . [ q - b i o . Q M ] J a n ies and thus measure individuals who have been infected,including those who have recovered.Finally, diﬀerent types of tests exhibit signiﬁcantly dif-ferent “Type I” (false positive) and “Type II” (false nega-tive) error rates. The accuracy of RT-PCR tests dependson viral load which may be too low to be detected in in-dividuals at the early stages of the infection, and mayalso depend on which sampling site in the body is cho-sen. Within serological testing, the kinetics of antibodyresponse are still largely unknown and it is not possibleto determine if and for how long a person may be immunefrom reinfection. Instrumentation errors and sample con-tamination may also result in a considerable number offalse positives and/or false negatives. These errors con-found the inference of the infected fraction. Speciﬁcally,at low prevalence, Type I false positive errors can signif-icantly bias the estimation of the IFR.Other quantities that are useful in tracking the dy-namics of a pandemic include the number of recov-ered individuals, tested, or untested. These quanti-ties may not be easily inferred from data and need tobe estimated from ﬁtting mathematical models such asSIR-type ODEs [9], age-structured PDEs [10], or net-work/contact models [11–13].Administration of tests and estimation of all quanti-ties above can vary widely across jurisdictions, makingit diﬃcult to properly compare numbers across them.In this paper, we incorporate excess death data, testingstatistics, and mathematical modeling to self-consistentlycompute and compare mortality across diﬀerent jurisdic-tions. In particular, we will use excess mortality statis-tics [14–16] to infer the number of COVID-19-induceddeaths across diﬀerent regions. We then present a sta-tistical testing model to estimate jurisdiction-speciﬁc in-fected fractions and mortalities, their uncertainty, andtheir dependence on testing bias and errors. Our statis-tical analyses and source codes are available at [17]. MethodsMortality measures

Many diﬀerent fatality rate measures have been de-ﬁned to quantify epidemic outbreaks [18]. One of themost common is the case fatality ratio (CFR) deﬁned asthe ratio between the number of conﬁrmed “infection-caused” deaths D c in a speciﬁed time window and thenumber of infections N c conﬁrmed within the same timewindow, CFR = D c /N c [19]. Depending on how deaths D c are counted and how infected individuals N c are de-ﬁned, the operational CFR may vary. It may even exceedone, unless all deaths are tested and included in N c .Another frequently used measure is the infection fatal-ity ratio (IFR) deﬁned as the true number of “infection-caused” deaths D = D c + D u divided by the actual num-ber of cumulative infections to date, N c + N u . Here, D u is the number of unreported infection-caused deaths within a speciﬁed period, and N u denotes the untestedor unreported infections during the same period. Thus,IFR = D/ ( N c + N u ).One major issue of both CFR and IFR is that theydo not account for the time delay between infection andresolution. Both measures may be quite inaccurate earlyin an outbreak when the number of cases grows fasterthan the number of deaths and recoveries [10]. An alter-native measure that avoids case-resolution delays is theconﬁrmed resolved mortality M = D c / ( D c + R c ) [10],where R c is the cumulative number of conﬁrmed recov-ered cases evaluated in the same speciﬁed time windowover which D c is counted. One may also deﬁne the trueresolved mortality via M = D/ ( D + R ), the proportionof the actual number of deaths relative to the total num-ber of deaths and recovered individuals during a speciﬁedtime period. If we decompose R = R c + R u , where R c arethe conﬁrmed and R u , the unreported recovered cases, M = ( D c + D u ) / ( D c + D u + R c + R u ). The total con-ﬁrmed population is deﬁned as N c = D c + R c + I c , where I c the number of living conﬁrmed infecteds. Applyingthese deﬁnitions to any speciﬁed time period (typicallyfrom the “start” of an epidemic to the date with themost recent case numbers), we observe that CFR ≤ M and IFR ≤ M . After the epidemic has long past, whenthe number of currently infected individuals I approachzero, the two fatality ratios and mortality measures con-verge if the component quantities are deﬁned and mea-sured consistently, lim t →∞ CFR( t ) = lim t →∞ M ( t ) andlim t →∞ IFR( t ) = lim t →∞ M ( t ) [10].The mathematical deﬁnitions of the four basic mor-tality measures Z = CFR , IFR , M, M deﬁned above aregiven in Table I and fall into two categories, conﬁrmedand total. Conﬁrmed measures (CFR and M ) rely onlyon positive test counts, while total measures (IFR and M ) rely on projections to estimate the number of in-fected persons in the total population N . Of the mea-sures listed in Table I, the fatality ratio CFR and con-ﬁrmed resolved mortality M do not require estimates ofunreported infections, recoveries, and deaths and can bedirectly derived from the available conﬁrmed counts D c , N c , and R c [20]. Estimation of IFR and the true resolvedmortality M requires the additional knowledge on theunconﬁrmed quantities D u , N u , and R u . We describe thepossible ways to estimate these quantities, along with theassociated sources of bias and uncertainty below. Excess deaths data

An unbiased way to estimate D = D c + D u , the cumu-lative number of deaths, is to compare total deaths withina time window in the current year to those in the sametime window of previous years, before the pandemic. Ifthe epidemic is widespread and has appreciable fatality,one may reasonably expect that the excess deaths can beattributed to the pandemic [21–25]. Within each aﬀectedregion, these “excess” deaths D e relative to “historical” (cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104)(cid:104) Subpopulation Measure Z Fatality Ratios Resolved Mortality Excess Death IndicesConﬁrmed CFR = D c N c M = D c D c + R c D e per 100,000: D c + D u , D c + D u N c + N u M = D c + D u D c + D u + R c + R u relative: r = (cid:80) i (cid:104) d (0) ( i ) − J (cid:80) Jj d ( j ) ( i ) (cid:105) J (cid:80) Jj (cid:80) i d ( j ) ( i )TABLE I: Deﬁnitions of mortality measures.

Quantities with subscript “c” and “u” denote conﬁrmed ( i.e. , positivelytested) and unconﬁrmed populations. For instance, D c , R c , and N c denote the total number of conﬁrmed dead, recovered,and infected individuals, respectively. d ( j ) ( i ) is the number of individuals who have died in the i th time window ( e.g. , day,week) of the j th previous year. The mean number of excess deaths between the periods k s and k this year ¯ D e is thus (cid:80) ki = k s (cid:104) d (0) ( i ) − J (cid:80) Jj =1 d ( j ) ( i ) (cid:105) . Where the total number of infection-caused deaths D c + D u appears,it can be estimatedusing the excess deaths ¯ D e over as detailed in the main text. We have also included raw death numbers/100,000 and the meanexcess deaths r relative to the mean number of deaths over the same period of time from past years (see Eqs. (1)). deaths, are independent of testing limitations and do notsuﬀer from highly variable deﬁnitions of virus-induceddeath. Thus, within the context of the COVID-19 pan-demic, D e is a more inclusive measure of virus-induceddeaths than D c and can be used to estimate the to-tal number of deaths, D e (cid:39) D c + D u . Moreover, us-ing data from multiple past years, one can also estimatethe uncertainty in D e . In practice, deaths are typicallytallied daily, weekly [21, 28], or sometimes aggregatedmonthly [27, 29] with historical records dating back J years so that for every period i there are a total of J + 1death values. We denote by d ( j ) ( i ) the total number ofdeaths recorded in period i from the j th previous yearwhere 0 ≤ j ≤ J and where j = 0 indicates the cur-rent year. In this notation, D = D c + D u = (cid:80) i d (0) ( i ),where the summation tallies deaths over several periodsof interest within the pandemic. Note that we can de-compose d (0) ( i ) = d (0)c ( i ) + d (0)u ( i ), to include the con-tribution from the conﬁrmed and unconﬁrmed deathsduring each period i , respectively. To quantify the to-tal cumulative excess deaths we derive excess deaths d ( j )e ( i ) = d (0) ( i ) − d ( j ) ( i ) per week relative to the j th previous year. Since d (0) ( i ) is the total number of deathsin week i of the current year, by deﬁnition d (0)e ( i ) ≡ i , ¯ d e ( i ), averaged over J past years and the associated, unbiased variance σ e ( i ) aregiven by ¯ d e ( i ) = 1 J J (cid:88) j =1 d ( j )e ( i ) ,σ ( i ) = 1 J − J (cid:88) j =1 (cid:104) d ( j )e ( i ) − ¯ d e ( i ) (cid:105) . (1)The corresponding quantities accumulated over k weeksdeﬁne the mean and variance of the cumulative excessdeaths ¯ D e ( k ) and Σ e ( k ) ¯ D e ( k ) = 1 J J (cid:88) j =1 k (cid:88) i =1 d ( j )e ( i ) , Σ ( k ) = 1 J − J (cid:88) j =1 (cid:34) k (cid:88) i =1 d ( j )e ( i ) − ¯ D e ( k ) (cid:35) , (2)where deaths are accumulated from the ﬁrst to the k th week of the pandemic. The variance in Eqs. (1) and (2)arise from the variability in the baseline number of deathsfrom the same time period in J previous years.We gathered excess death statistics from over 23 coun-tries and all US states. Some of the data derive fromopen-source online repositories as listed by oﬃcial statis-tical bureaus and health ministries [21–25, 29]; other dataare elaborated and tabulated in Ref. [27]. In some coun-tries excess death statistics are available only for a lim-ited number of states or jurisdictions ( e.g. , Brazil). TheUS death statistics that we use in this study is basedon weekly death data between 2015–2019 [29]. For allother countries, the data collection periods are summa-rized in Ref. [27]. Fig. A1(a-b) shows historical deathdata for NYC and Germany, while Fig. A1(c-d) plots theconﬁrmed and excess deaths and their conﬁdence lev-els computed from Eqs. (1) and (2). We assumed thatthe cumulative summation is performed from the start of2020 to the current week k = K so that ¯ D e ( K ) ≡ ¯ D e in-dicates excess deaths at the time of writing. Signiﬁcantnumbers of excess deaths are clearly evident for NYC,while Germany thus far has not experienced signiﬁcantexcess deaths.To evaluate CFR and M , data on only D c , N c , and R c are required, which are are tabulated by many ju-risdictions. To estimate the numerators of IFR and M ,we approximate D c + D u ≈ ¯ D e using Eq. (2). For thedenominators, estimates of the unconﬁrmed infected N u and unconﬁrmed recovered populations R u are required.In the next two sections we propose methods to estimate (a) (c) (d) (b) FIG. 1:

Examples of seasonal mortality and excess deaths . The evolution of weekly deaths in (a) New York City(six years) and (b) Germany (ﬁve years) derived from data in Refs. [26, 27]. Grey solid lines and shaded regions representthe historical numbers of deaths and corresponding conﬁdence intervals deﬁned in Eq. (1). Blue solid lines indicate weeklydeaths, and weekly deaths that lie outside the conﬁdence intervals are indicated by solid red lines. The red shaded regionsrepresent statistically signiﬁcant mean cumulative excess deaths D e . The reported weekly conﬁrmed deaths d (0)c ( i ) (dashedblack curves), reported cumulative conﬁrmed deaths D c ( k ) (dashed dark red curves), weekly excess deaths ¯ d e ( i ) (solid greycurves), and cumulative excess deaths ¯ D e ( k ) (solid red curves) are plotted in units of per 100,000 in (c) and (d) for NYC andGermany, respectively. The excess deaths and the associated 95% conﬁdence intervals given by the error bars are constructedfrom historical death data in (a-b) and deﬁned in Eqs. (1) and (2). In NYC there is clearly a signiﬁcant number of excess deathsthat can be safely attributed to COVID-19, while to date in Germany, there have been no signiﬁcant excess deaths. Excessdeath data from other jurisdictions are shown in the Supplementary Information and typically show excess deaths greater thanreported conﬁrmed deaths (with Germany an exception as shown in (d)). N u using a statistical testing model and R u using com-partmental population model. Statistical testing model with bias and testing errors

The total number of conﬁrmed and unconﬁrmed in-fected individuals N c + N u appears in the denominatorof the IFR. To better estimate the infected populationwe present a statistical model for testing in the pres-ence of bias in administration and testing errors. Al-though N c + N u used to estimate the IFR includes thosewho have died, depending on the type of test, it may ormay not include those who have recovered. If S, I, R, D are the numbers of susceptible, currently infected, recov-ered, and deceased individuals, the total population is N = S + I + R + D and the infected fraction can bedeﬁned as f = ( N c + N u ) /N = ( I + R + D ) /N for teststhat include recovered and deceased individuals ( e.g. , an-tibody tests), or f = ( N c + N u ) /N = ( I + D ) /N for teststhat only count currently infected individuals ( e.g. , RT-PCR tests). If we assume that the total population N can be inferred from census estimates, the problem ofidentifying the number of unconﬁrmed infected persons N u is mapped onto the problem of identifying the truefraction f of the population that has been infected.Typically, f is determined by testing a representative sample and measuring the proportion of infected personswithin the sample. Besides the statistics of sampling, twomain sources of systematic errors arise: the non-randomselection of individuals to be tested and errors intrinsic tothe tests themselves. Biased sampling arises when test-ing policies focus on symptomatic or at-risk individuals,leading to over-representation of infected individuals.Figure 2 shows a schematic of a hypothetical initialtotal population of N = 54 individuals in a speciﬁed ju-risdiction. Without loss of generality we assume thereare no unconﬁrmed deaths, D u = 0, and that all con-ﬁrmed deaths are equivalent to excess deaths, so that¯ D e = D c = 5 in the jurisdiction represented by Fig. 2.Apart from the number of deceased, we also show thenumber of infected and uninfected subpopulations andlabel them as true positives, false positives, and falsenegatives. The true number of infected individuals is N c + N u = 16 which yields the true f = 16 /

54 = 0 . /

15 = 0 . > f = 0 .

296 isbiased since it includes a higher proportion of infectedpersons, both alive and deceased, than that of the entire

FIG. 2:

Biased and unbiased testing of a population.

A hypothetical scenario of testing a population (total N = 54individuals) within a jurisdiction (solid black boundary). Filled red circles represent the true number of infected individuals whotested positive and the black-ﬁlled red circles indicate individuals who have died from the infection. Open red circles denoteuninfected individuals who were tested positive (false positives) while ﬁlled red circles with dark gray borders are infectedindividuals who were tested negative (false negatives). In the jurisdiction of interest 5 have died of the infection while 16 aretruly infected. The true fraction f of infected in the entire population is thus f = 16 /

54 and the true IFR=5 /

16. However, undertesting (green and blue) samples, a false positive is shown to arise. If the apparent positive fraction ˜ f b is derived from a biasedsample (blue), the estimated apparent IFR can be quite diﬀerent from the true one. For a less biased (more random) testingsample (green sample), a more accurate estimate of the total number of infected individuals is N c + N u = ˜ f b N = (5 / × ≈ f b N = (4 / × ≈

15 when the false positive is excluded, andallows us to more accurately infer the IFR. Note that CFR is deﬁned according to the tested quantities D c /N c which areprecisely 2 / / / jurisdiction. Using this biased measured infected frac-tion of 8 /

15 yields IFR = 5 / (0 . · ≈ . . / ≈ .

286 and an ap-parent IFR ≈ .

324 which are much closer to the truefraction f and IFR. In both samples discussed above weneglected testing errors such as false positives indicatedin Fig. 2. Tests that are unable to distinguish false pos-itives as negatives would yield a larger N c , resulting inan apparent infected fraction 9 /

15 and an even smallerapparent IFR ≈ . /

15 = 0 .

333 and IFR= 0.259.Given that test administration can be biased, we pro-pose a parametric form for the apparent or measuredinfected fraction f b = f b ( f, b ) ≡ f e b f ( e b −

1) + 1 , (3)to connect the apparent (biased sampling) infected frac-tion f b with the true underlying infection fraction. Thebias parameter −∞ < b < ∞ describes how an infectedor uninfected individual might be preferentially selectedfor testing, with b < f b < f ) indicating under-testing of infected individuals, and b > f b > f ) representing over-testing of infecteds. A truly random,unbiased sampling arises only when b = 0 where f b = f .Given Q (possibly biased) tests to date, testing errors,and ground-truth infected fraction f , we derive in the SIthe likelihood of observing a positive fraction ˜ f b = ˜ Q + /Q (where ˜ Q + is the number of recorded positive tests): P ( ˜ f b | θ ) ≈ √ πσ T exp (cid:34) − ( ˜ f b − µ ) σ (cid:35) , (4)in which µ ≡ f b ( f, b )(1 − FNR) + (1 − f b ( f, b ))FPR ,σ ≡ µ (1 − µ ) /Q. (5)Here, µ is the expected value of the measured and biasedfraction ˜ f b and σ is its variance. Note that the param-eters θ = { Q, f, b,

FPR , FNR } may be time-dependentand change from sample to sample. Along with the like-lihood function P ( ˜ f b | f, θ ), one can also propose a priordistribution P ( θ | α ) with hyperparameters α , and applyBayesian methods to infer θ (see SI).To evaluate IFR, we must now estimate f given ˜ f b =˜ Q + /Q and possible values for FPR, FNR, and/or b , orthe hyperparameters α deﬁning their uncertainty. Thesimplest maximum likelihood estimate of f can be foundby maximizing P ( ˜ f b | θ ) with respect to f given a mea-sured value ˜ f b and all other parameter values θ speciﬁed:ˆ f ≈ ˜ f b − FPR e b (1 − FNR − ˜ f b ) + ˜ f b − FPR . (6)Note that although FNRs are typically larger than FPRs,small values of f and ˜ f b imply that ˆ f and µ are moresensitive to the FPR, as indicated by Eqs. (5) and (6).If time series data for ˜ f b = ˜ Q + /Q are available, one canevaluate the corrected testing fractions in Eq. (6) for eachtime interval. Assuming that serological tests can iden-tify infected individuals long after symptom onset, thelatest value of ˆ f would suﬃce to estimate correspondingmortality metrics such as the IFR. For RT-PCR testing,one generally needs to track how ˜ f b evolves in time. Arough estimate would be to use the mean of ˜ f b over thewhole pandemic period to provide a lower bound of theestimated prevalence ˆ f .The measured ˜ f b yields only the apparent IFR =¯ D e / ( ˜ f b N ), but Eq. (6) can then be used to evaluatethe corrected IFR ≈ ¯ D e / ( ˆ f N ) which will be a betterestimate of the true IFR. For example, under moder-ate bias | b | (cid:46) f b (cid:46) D e / ( ˜ f b N ) ∼ ¯ D e / (( ˆ f e b + FPR) N ).Another commonly used representation of the IFR isIFR = p ( D c + D u ) /N c = p ¯ D e /N c . This expression isequivalent to our IFR = ¯ D e / ( f N ) if p = N c / ( N c + N u ) ≈ ˜ Q + / ( f N ) is deﬁned as the fraction of infected individualsthat are conﬁrmed [30, 31]. In this alternative representa-tion, the p factor implicitly contains the eﬀects of biasedtesting. Our approach allows the true infected fraction f to be directly estimated from ˜ Q + and N .While the estimate ˆ f depends strongly on b and FPR,and weakly on FNR, the uncertainty in f will depend onthe uncertainty in the values of b , FPR, and FNR. ABayesian framework is presented in the SI, but under aa Gaussian approximation for all distributions, the un-certainty in the testing parameters can be propagatedto the squared coeﬀcient σ f / ˆ f of variation of the esti-mated infected fraction ˆ f , as explicitly computed in theSI. Moreover, the uncertainties in the mortality indices Z decomposed into the uncertainties of their individualcomponents are listed in Table II. Using compartmental models to estimate resolvedmortalities

Since the number of unreported recovered individu-als R u required to calculate M is not directly relatedto excess deaths nor to positive-tested populations, weuse an SIR-type compartmental model to relate R u toother inferable quantities [9]. Both unconﬁrmed recov-ered individuals and unconﬁrmed deaths are related to unconﬁrmed infected individuals who recover at rate γ u and die at rate µ u . The equations for the cumulativenumbers of unconﬁrmed recovered individuals and un-conﬁrmed deaths,d R u ( t )d t = γ u ( t ) I u ( t ) , d D u ( t )d t = µ u ( t ) I u ( t ) , (7)can be directly integrated to ﬁnd R u ( t ) = (cid:82) t γ u ( t (cid:48) ) I u ( t (cid:48) )d t (cid:48) and D u ( t ) = (cid:82) t µ u ( t (cid:48) ) I u ( t (cid:48) )d t (cid:48) . Therates γ u and µ u may diﬀer from those averaged over theentire population since testing may be biased towardssubpopulations with diﬀerent values of γ u and µ u . If oneassumes γ u and µ u are approximately constant over theperiod of interest, we ﬁnd R u /D u ≈ γ u /µ u ≡ γ . We nowuse D u = ¯ D e − D c , where both ¯ D e and D c are given bydata, to estimate R u ≈ γ ( ¯ D e − D c ) and write M as M = ¯ D e ¯ D e + R c + γ ( ¯ D e − D c ) . (8)Thus, a simple SIR model transforms the problem of de-termining the number of unreported death and recoveredcases in M to the problem of identifying the recoveryand death rates in the untested population. Alterna-tively, we can make use of the fact that both the IFRand resolved mortality M should have comparable val-ues and match M to IFR ≈ . − .

5% [31–33] by setting γ ≡ γ u /µ u ≈ − D c > ¯ D e . Since by deﬁnition, infection-caused excessdeaths must be greater than the conﬁrmed deaths, we set¯ D e − D c = 0 whenever data happens to indicate ¯ D e tobe less than D c . Results

Here, we present much of the available worldwide fa-tality data, construct the excess death statistics, andcompute mortalities and compare them across jurisdic-tions. We show that standard mortality measures sig-niﬁcantly underestimate the death toll of COVID-19 formost regions (see Figs. A1 and A2). We also use the datato estimate uncertainties in the mortality measures andrelate them uncertainties of the underlying componentsand model parameters.

Excess and conﬁrmed deaths

We ﬁnd that in New York City for example, the num-ber of conﬁrmed COVID-19 deaths between March 10,2020 and December 10, 2020 is 19,694 [34] and thus sig-niﬁcantly lower than the 27,938 (95% CI 26,516–29,360)reported excess mortality cases [21]. From March 25,2020 until December 10, 2020, Spain counts 65,673 (99%conﬁdence interval [CI] 91,816–37,061) excess deaths [22],

FIG. 3:

Excess deaths versus conﬁrmed deaths acrossdiﬀerent countries/states.

The number of excess deathsin 2020 versus conﬁrmed deaths across diﬀerent countries (a)and US states (b). The black solid lines in both panels haveslope 1. In (a) the blue solid line is a guide-line with slope3; in (b) the blue solid line is a least-squares ﬁt of the datawith slope 1 .

132 (95% CI 1.096–1.168; blue shaded region).All data were updated on December 10, 2020 [20, 27, 29, 36]. a number that is substantially larger than the oﬃciallyreported 47,019 COVID-19 deaths [35]. The large dif-ference between excess deaths and reported COVID-19deaths in Spain and New York City is also observed inLombardia, one of the most aﬀected regions in Italy.From February 23, 2020 until April 4, 2020, Lombar-dia reported 8,656 reported COVID-19 deaths [35] but13,003 (95% 12,335–13,673) excess deaths [25]. StartingApril 5 2020, mortality data in Lombardia stopped be-ing reported in a weekly format. In England/Wales, thenumber of excess deaths from the onset of the COVID-19outbreak on March 1, 2020 until November 27, 2020 is70,563 (95% CI 52,250–88,877) whereas the number ofreported COVID-19 deaths in the same time interval is66,197 [26]. In Switzerland, the number of excess deathsfrom March 1, 2020 until November 29, 2020 is 5,664(95% CI 4,281–7,047) [24], slightly larger than the corre-sponding 4,932 reported COVID-19 deaths [35].To illustrate the signiﬁcant diﬀerences between excessdeaths and reported COVID-19 deaths in various juris-dictions, we plot the excess deaths against conﬁrmeddeaths for various countries and US states as of Decem-ber 10, 2020 in Fig. 3. We observe in Fig. 3(a) that thenumber of excess deaths in countries like Mexico, Russia,Spain, Peru, and Ecuador is signiﬁcantly larger than the corresponding number of conﬁrmed COVID-19 deaths.In particular, in Russia, Ecuador, and Spain the num-ber of excess deaths is about three times larger than thenumber of reported COVID-19 deaths. As described inthe Methods section, for certain countries ( e.g. , Brazil)excess death data is not available for all states [27]. Forthe majority of US states the number of excess deathsis also larger than the number of reported COVID-19deaths, as shown in Fig. 3(b). We performed a least-square ﬁt to calculate the proportionality factor m aris-ing in ¯ D e = mD c and found m ≈ .

132 (95% CI 1.096–1.168). That is, across all US states, the number of excessdeaths is about 13% larger than the number of conﬁrmedCOVID-19 deaths.

Estimation of mortality measures and theiruncertainties

We now use excess death data and the statistical andmodeling procedures to estimate mortality measures Z =IFR, CFR, M , M across diﬀerent jurisdictions, includ-ing all US states and more than two dozen countries. .Accurate estimates of the conﬁrmed N c and dead D c in-fected are needed to evaluate the CFR. Values for theparameters Q , FPR, FNR, and b are needed to estimate N c + N u = f N in the denominator of the IFR, while ¯ D e isneeded to estimate the number of infection-caused deaths D c + D u that appear in the numerator of the IFR and M . Finally, since we evaluate the resolved mortality M ,through Eq. 8, estimates of ¯ D e , D c , R c , γ , and FPR, FNR(to correct for testing inaccuracies in D c and R c ) are nec-essary. Whenever uncertainties are available or inferablefrom data, we also include them in our analyses.Estimates of excess deaths and infected populationsthemselves suﬀer from uncertainty encoded in the vari-ances Σ and σ f . These uncertainties depend on uncer-tainties arising from ﬁnite sampling sizes, uncertainty inbias b and uncertainty in test sensitivity and speciﬁcity,which are denoted σ b , σ , and σ , respectively. We useΣ to denote population variances and σ to denote pa-rameter variances; covariances with respect to any twovariables X, Y are denoted as Σ

X,Y . Variances in theconﬁrmed populations are denoted Σ N c , Σ R c , and Σ D c and also depend on uncertainties in testing parameters σ and σ . The most general approach would be to de-ﬁne a probability distribution or likelihood for observingsome value of the mortality index in [ Z, Z + d Z ]. Asoutlined in the SI, these probabilities can depend on themean and variances of the components of the mortali-ties, which in turn may depend on hyperparameters thatdetermine these means and variances. Here, we simply We provide an online dashboard that shows the real-time evolu-tion of CFR and M at https://submit . epidemicdatathon . com/ FIG. 4:

Diﬀerent mortality measures across diﬀerentregions. (a) The apparent (dashed lines) and corrected (solidlines) IFR in the US, as of November 1, 2020, estimated us-ing excess mortality data. We set ˜ f b = 0 . , .

15 (black,red),FPR = 0 .

05, FNR = 0 .

2, and N = 330 million. For the cor-rected IFR, we use ˆ f as deﬁned in Eq. (6). Unbiased testingcorresponds to setting b = 0. For b > b is suﬃciently small (negative testing bias),the corrected IFR may be smaller than the apparent IFR. (b)The coeﬃcient of variation of D e (dashed line) and IFR (solidlines) with σ I = 0 . σ II = 0 .

05, and σ b = 0 . assume uncertainties that are propagated to the mortal-ity indices through variances in the model parametersand hyperparameters [37]. The squared coeﬃcients ofvariation of the mortalities are found by linearizing themabout the mean values of the underlying components andare listed in Table II.To illustrate the inﬂuence of diﬀerent biases b onthe IFR we use ˆ f from Eq. (6) in the correctedIFR ≈ ¯ D e / ( ˆ f N ). We model RT-PCR-certiﬁed COVID-19 deaths [38] by setting the FPR = 0 .

05 [39] and theFNR = 0 . f b = ˜ Q + /Q can be directly obtainedfrom corresponding empirical data. As of November 1,2020, the average of ˜ f b over all tests and across all USstates is about 9.3% [42]. The corresponding number ofexcess deaths is ¯ D e = 294 ,

700 [27] and the US populationis about N ≈

330 million [43]. To study the inﬂuence ofvariations in ˜ f b , in addition to ˜ f b = 0 . f b = 0 .

15 in our analysis. In Fig. 4we show the apparent and corrected IFRs for two valuesof ˜ f b [Fig. 4(a)] and the coeﬃcient of variation CV IFR [Fig. 4(b)] as a function of the bias b and as made ex-plicit in Table I. For unbiased testing [ b = 0 in Fig. 4(a)], the corrected IFR in the US is 1.9% assuming ˜ f b = 0 . f b = 0 .

15. If b >

0, there is a testingbias towards the infected population, hence, the appar-ent IFR = ¯ D e / ( ˜ f b N ) is smaller than the corrected IFR ascan be seen by comparing the solid (corrected IFR) andthe dashed (apparent IFR) lines in Fig. 4(a). For testingbiased towards the uninfected population ( b < b af-fect uncertainty in IFR, we evaluate CV IFR as given inTable II.The ﬁrst term in uncertainty σ f / ˆ f given in Eq. (A6)is proportional to 1 /Q and can be assumed to be neg-ligibly small, given the large number Q of tests admin-istered. The other terms in Eq. (A6) are evaluated byassuming σ b = 0 . , σ I = 0 .

02, and σ II = 0 .

05 and bykeeping FPR = 0.05 and FNR = 0.2. Finally, we in-fer Σ e from empirical data, neglect correlations between D e and N , and assume that the variation in N is neg-ligible so that Σ e ,N = Σ N ≈

0. Fig. 4(b) plots CV

IFR and CV D e in the US as a function of the underlying bias b . The coeﬃcient of variation CV D e is about 1%, muchsmaller than CV IFR , and independent of b . For the val-ues of b shown in Fig. 4(b), CV IFR is between 47–64%for ˜ f b = 0 .

093 and between 20–27% for ˜ f b = 0 . Z =IFR,CFR, M , M and the relative excess deaths r listed inTab. I across numerous jurisdictions. To determine theCFR, we use the COVID-19 data of Refs. [20, 36]. For theapparent IFR, we use the representation IFR = p ¯ D e /N c discussed above. Although p may depend on the stageof the pandemic, typical estimates range from 4% [44] to10% [31]. We set p = 0 . D e / ( f N ),however estimating the corrected IFR requires evaluat-ing the bias b . In Fig. 5(a), we show the values of therelative excess deaths r , the CFR, the apparent IFR, theconﬁrmed resolved mortality M , and the true resolvedmortality M for diﬀerent (unlabeled) regions. In all caseswe set p = 0 . , γ = 100. As illustrated in Fig. 5(b), somemortality measures suggest that COVID-induced fatal-ities are lower in certain countries compared to others,whereas other measures indicate the opposite. For ex-ample, the total resolved mortality M for Brazil is largerthan for Russia and Mexico, most likely due to the rel-atively low number of reported excess deaths as can beseen from Fig. 3 (a). On the other hand, Brazil’s valuesof CFR, IFR, and M are substantially smaller than thoseof Mexico [see Fig. 5(b)].The distributions of all measures Z and relative ex-cess deaths r across jurisdictions are shown Fig. 5(c–g)and encode the global uncertainty of these indices. Wealso calculate the corresponding mean values across ju-risdictions, and use the empirical cumulative distribu-tion functions to determine conﬁdence intervals. Themean values across all jurisdictions are r = 0 .

08 (95% CI0.0025–0.7800), CFR = 0 .

020 (95% CI 0.0000–0.0565),IFR = 0 . M = 0 .

038 (95%

Mortality Z Uncertainties CV = Σ Z Z CFR = D c N c Σ D c , Σ N c , Σ D c ,N c Σ D c D + Σ N c N − D c ,N c D c N c IFR = D c + D u N c + N u ≈ ¯ D e fN Σ , Σ N , Σ e ,N , σ f σ f ˆ f + Σ ¯ D + Σ N N − e ,N ¯ D e NM = D c D c + R c Σ D c , Σ R c , Σ D c ,R c M (cid:18) R c D c (cid:19) (cid:20) Σ D c D + Σ R c R − R c ,D c R c D c (cid:21) M = ¯ D e ¯ D e + R c + R u Σ , Σ R c , Σ R u , Σ R c ,R u (1 − M ) Σ ¯ D + Σ R c Γ + Σ R u Γ − R c ,R u Γ M = ¯ D e ¯ D e + R c + γ ( ¯ D e − D c ) Σ R c , Σ , Σ R c ,γ , σ γ (1 − M ) Σ ¯ D + Σ R c Γ + ( ¯ D e − D c ) σ γ Γ −

2( ¯ D e − D c )Σ R c ,γ Γ TABLE II:

Uncertainty propagation for diﬀerent mortality measures.

Table of squared coeﬃcients of variation CV =Σ Z /Z for the diﬀerent mortality indices Z derived using standard error propagation expansions [37]. We use Σ N , Σ N c , Σ R c ,and Σ D c to denote the uncertainties in the total population, conﬁrmed cases, recoveries, and deaths, respectively. The varianceof the number of excess deaths is Σ , which feature in the IFR and M . The uncertainty in the infected fraction σ f thatcontributes to the uncertainty in IFR depends on uncertainties in testing bias and testing errors as shown in Eq. (A6). Theterm Σ D c ,N c represents the covariance between D c , N c , and similarly for all other covariances Σ e ,N , Σ D c ,R c , Σ R c ,R u , Σ R c ,γ .Since variations in D e arise from ﬂuctuations in past-year baselines and not from current intrinsic uncertainty, we can neglectcorrelations between variations in D e and uncertainty in R c , R u . In the last two rows, representing M expressed in two diﬀerentways, Γ ≡ ¯ D e + R c + R u and ¯ D e + R c + γ ( ¯ D e − D c ), respectively. Moreover, when using the SIR model to replace D u and R u with ¯ D e − D c ≥

0, there is no uncertainty associated with D u and R u in a deterministic model. Thus, covariances cannot bedeﬁned except through the uncertainty in the parameter γ = γ u /µ u . CI 0.0000–0.236), and M = 0 .

027 (95% CI 0.000–0.193).For calculating M and M , we excluded countries withincomplete recovery data. The distributions plotted inFig. 5(c–g) can be used to inform our analyses of un-certainty or heterogeneity as summarized in Tab. II. Forexample, the overall variance Σ Z can be determined byﬁtting the corresponding empirical Z distribution shownin Fig. 5(c–g). Table II displays how the related CV can be decomposed into separate terms, each arising fromthe variances associated to the components in the deﬁ-nition of Z . For concreteness, from Fig. 5(e) we obtainCV = Σ / IFR ≈ .

16 which allows us to place anupper bound on σ b using Eq. (A6), the results of Tab. II,and σ b < ( ˜ f b − FPR) ˆ f (1 − ˆ f ) CV ≈ ( ˜ f b − FPR) ˆ f (1 − ˆ f ) .

16 (9)or on σ using (1 − ˆ f ) σ < ( ˜ f b − FPR) CV .Finally, to provide more insight into the correlationsbetween diﬀerent mortality measures, we plot M againstCFR and M against IFR in Fig. 6. For most regions,we observe similar values of M and CFR in Fig. 6(a).Althouigh we expect M → CFR and

M →

IFR to-wards the end of an epidemic, in some regions such asthe UK, Sweden, Netherlands, and Serbia, M (cid:29) CFRdue to unreported or incomplete reporting of recoveredcases. About 50% of the regions that we show in Fig. 6(b) have an IFR that is approximately equal to M . Again,for regions such as Sweden and the Netherlands, M issubstantially larger than IFR because of incomplete re-porting of recovered cases. DiscussionRelevance

In the ﬁrst few weeks of the initial COVID-19 out-break in March and April 2020 in the US, the reporteddeath numbers captured only about two thirds of the to-tal excess deaths [15]. This mismatch may have arisenfrom reporting delays, attribution of COVID-19 relateddeaths to other respiratory illnesses, and secondary pan-demic mortality resulting from delays in necessary treat-ment and reduced access to health care [15]. We alsoobserve that the number of excess deaths in the Fallmonths of 2020 have been signiﬁcantly higher than thecorresponding reported COVID-19 deaths in many USstates and countries. The weekly numbers of deaths inregions with a high COVID-19 prevalence were up to 8times higher than in previous years. Among the coun-tries that were analyzed in this study, the ﬁve coun-tries with the largest numbers of excess deaths since thebeginning of the COVID-19 outbreak (all numbers per100,000) are Peru (256), Ecuador (199), Mexico (151),0

FIG. 5:

Mortality characteristics in diﬀerent countries and states. (a) The values of relative excess deaths r , the CFR,the IFR = p ¯ D e /N c with p = N c / ( N c + N u ) = 0 . M , and the true resolved mortality M (using γ = 100) are plotted for various jurisdictions. (b) Diﬀerent mortality measures provide ambiguous characterizations ofdisease severeness. (c–g) The probability density functions (PDFs) of the mortality measures shown in (a) and (b). Note thatthere are only very incomplete recovery data available for certain countries ( e.g. , US and UK). For countries without recoverydata, we could not determine M and M . The number of jurisdictions that we used in (a) and (c–g) are 77, 246, 73, 191, and21 for the respective mortality measures (from left to right). All data were updated December 10, 2020 [20, 27, 29, 36]. Spain (136), and Belgium (120). The ﬁve countries withthe lowest numbers of excess deaths since the beginningof the COVID-19 outbreak are Denmark (2), Norway(6), Germany (8), Austria (31), and Switzerland (33) [27] . If one includes the months before the outbreak, thenumbers of excess deaths per 100,000 in 2020 in Ger-many, Denmark, and Norway are -3209, -707, and -34,respectively. In the early stages of the COVID-19 pan-demic, testing capabilities were often insuﬃcient to re-solve rapidly-increasing case and death numbers. Thisis still the case in some parts of the world, in particularin many developing countries [45]. Standard mortalitymeasures such as the IFR and CFR thus suﬀer from atime-lag problem. Note that Switzerland experienced a rapid growth in excessdeaths in recent weeks. More recent estimates of the numberof excess deaths per 100,000 suggest a value of 64 [26], whichis similar to the corresponding excess death value observed inSweden.

Strengths and limitations

The proposed use of excess deaths in standard mor-tality measures may provide more accurate estimates ofinfection-caused deaths, while errors in the estimates ofthe fraction of infected individuals in a population fromtesting can be corrected by estimating the testing biasand testing speciﬁcity and sensitivity. One could sharpenestimates of the true COVID-19 deaths by systematicallyanalyzing the statistics of deaths from all reported causesusing a standard protocol such as ICD-10 [46]. For exam-ple, the mean traﬃc deaths per month in Spain between2011-2016 is about 174 persons [47], so any pandemic-related changes to traﬃc volumes would have little im-pact considering the much larger number of COVID-19deaths.Diﬀerent mortality measures are sensitive to diﬀerentsources of uncertainty. Under the assumption that allexcess deaths are caused by a given infectious disease( e.g. , COVID-19), the underlying error in the determinednumber of excess deaths can be estimated using historicaldeath statistics from the same jurisdiction. Uncertaintiesin mortality measures can also be decomposed into theuncertainties of their component quantities, including thepositive-tested fraction f that depend on uncertainties in1 FIG. 6:

Diﬀerent mortality measures across diﬀerentregions.

We show the values of M and CFR (a) and M (using γ = 100) and IFR = p ¯ D e /N c with p = N c / ( N c + N u ) =0 . R c = 0 and M = 1 [light red disks in (a)]. Injurisdictions for which the data indicate ¯ D e < D c , we set γ ( ¯ D e − D c ) = 0 in the denominator of M which prevents itfrom becoming negative as long as ¯ D e ≥

0. All data wereupdated on December 10, 2020 [20, 27, 29, 36]. the testing parameters.As for all epidemic forecasting and surveillance, ourmethodology depends on the quality of excess death andCOVID-19 case data and knowledge of testing parame-ters. For many countries, the lack of binding interna-tional reporting guidelines, testing limitations, and pos-sible data tampering [48] complicates the application ofour framework. A striking example of variability is thelarge discrepancy between excess deaths D e and con-ﬁrmed deaths D c across many jurisdictions which ren-der mortalities that rely on D c suspect. More researchis necessary to disentangle the excess deaths that are di-rectly caused by SARS-CoV-2 infections from those thatresult from postponed medical treatment [15], increasedsuicide rates [49], and other indirect factors contributingto an increase in excess mortality. Even if the numbersof excess deaths were accurately reported and known tobe caused by a given disease, inferring the correspondingnumber of unreported cases ( e.g. , asymptomatic infec-tions), which appears in the deﬁnition of the IFR and M (see Tab. I), is challenging and only possible if additionalmodels and assumptions are introduced.Another complication may arise if the number of ex-cess deaths is not signiﬁcantly larger than the historical mean. Then, excess-death-based mortality estimates suf-fer from large uncertainty/variability and may be mean-ingless. While we have considered only the average orlast values of ˜ f b , our framework can be straightforwardlyextended and dynamically applied across successive timewindows, using e.g. , Bayesian or Kalman ﬁltering ap-proaches.Finally, we have not resolved the excess deaths or mor-talities with respect to age or other attributes such assex, co-morbidities, occupation, etc. We expect that age-structured excess deaths better resolve a jurisdiction’soverall mortality. By expanding our testing and model-ing approaches on stratiﬁed data, one can also straight-forwardly infer stratiﬁed mortality measures Z , providingadditional informative indices for comparison. Conclusions

Based on the data presented in Figs. 5 and 6, weconclude that the mortality measures r , CFR, IFR, M ,and M may provide diﬀerent characterizations of diseaseseverity in certain jurisdictions due to testing limitationsand bias, diﬀerences in reporting guidelines, reportingdelays, etc. The propagation of uncertainty and coeﬃ-cients of variation that we summarize in Tab. II can helpquantify and compare errors arising in diﬀerent mortal-ity measures, thus informing our understanding of theactual death toll of COVID-19. Depending on the stageof an outbreak and the currently available disease moni-toring data, certain mortality measures are preferable toothers. If the number of recovered individuals is beingmonitored, the resolved mortalities M and M should bepreferred over CFR and IFR, since the latter suﬀer fromerrors associated with the time-lag between infection andresolution [10]. For estimating IFR and M , we proposeusing excess death data and an epidemic model. In situ-ations in which case numbers cannot be estimated accu-rately, the relative excess deaths r provides a complemen-tary measure to monitor disease severity. Our analysesof diﬀerent mortality measures reveal that • The CFR and M are deﬁned directly from con-ﬁrmed deaths D c and suﬀers from variability inits reporting. Moreover, the CFR does not con-sider resolved cases and is expected to evolve duringan epidemic. Although M includes resolved cases,its additionally required conﬁrmed recovered cases R c add to its variability across jurisdictions. Test-ing errors aﬀect both D c and R c , but if the FNRand FPR are known, they can be controlled usingEq. (A3) given in the SI. • The IFR requires knowledge of the true cumula-tive number of disease-caused deaths as well as thetrue number of infected individuals (recovered ornot) in a population. We show how these can beestimated from excess deaths and testing, respec-tively. Thus, the IFR will be sensitive to the in-2ferred excess deaths and from the testing (particu-larly from the bias in the testing). Across all coun-tries analyzed in this study, we found a mean IFRof about 0.24% (95% CI 0.0–1.5%), which is similarto the previously reported values between 0.1 and1.5% [31–33]. • In order to estimate the resolved true mortality M ,an additional relationship is required to estimatethe unconﬁrmed recovered population R u . In thispaper, we propose a simple SIR-type model in or-der to relate R u to measured excess and conﬁrmeddeaths through the ratio of the recovery rate to thedeath rate. The variability in reporting D c acrossdiﬀerent jurisdictions generates uncertainty in M and reduces its reliability when compared acrossjurisdictions. • The mortality measures that can most reliably becompared across jurisdictions should not depend onreported data which are subject to diﬀerent pro-tocols, errors, and manipulation/intentional omis-sion. Thus, the per capital excess deaths and rela-tive excess deaths r (see last column of Table I) arethe measures that provide the most consistent com-parisons of disease mortality across jurisdictions(provided total deaths are accurately tabulated).However, they are the least informative in terms ofdisease severity and individual risk, for which M and M are better. • Uncertainty in all mortalities Z can be decomposed into the uncertainties in component quantities suchas the excess death or testing bias. We can useglobal data to estimate the means and variances in Z , allowing us to put bounds on the variances ofthe component quantities and/or parameters.Parts of our framework can be readily integrated intoor combined with mortality surveillance platforms suchas the European Mortality Monitor (EURO MOMO)project [28] and the Mortality Surveillance System of theNational Center for Health Statistics [21] to assess dis-ease burden in terms of diﬀerent mortality measures andtheir associated uncertainty. Data availability

All datasets used in this study are available fromRefs. [21–25]. The source codes used in our analyses arepublicly available at [17].

Acknowledgements

LB acknowledges ﬁnancial support from the Swiss Na-tional Fund (P2EZP2 191888). The authors also ac-knowledge ﬁnancial support from the Army Research Of-ﬁce (W911NF-18-1-0345), the NIH (R01HL146552), andthe National Science Foundation (DMS-1814364, DMS-1814090). [1] World Health Organization. WHO Director-General’s opening remarks at the media brieﬁng on COVID-19 - 11March 2020. . who . int/dg/speeches/detail/who-director-general-s-opening-remarks-at-the-media-briefing-on-covid-19---11-march-2020 , 2020. Accessed: 2020-04-18.[2] COVID-19 statistics. . worldometers . info/coronavirus/ , 2020. Accessed: 2021-01-05.[3] Xi He, Eric HY Lau, Peng Wu, Xilong Deng, Jian Wang, Xinxin Hao, Yiu Chung Lau, Jessica Y Wong, Yujuan Guan,Xinghua Tan, et al. Temporal dynamics in viral shedding and transmissibility of COVID-19. Nature Medicine , 26(5):672–675, 2020.[4] Conditions contributing to deaths involving coronavirus disease 2019 (COVID-19), by age group and state, UnitedStates. https://data . cdc . gov/NCHS/Conditions-contributing-to-deaths-involving-corona/hk9y-quqm Journal of the RoyalSociety of Medicine , 113(9):329–334, 2020.[7] Graziano Onder, Giovanni Rezza, and Silvio Brusaferro. Case-fatality rate and characteristics of patients dying in relationto COVID-19 in Italy.

JAMA , 2020.[8] CDC. Understanding the Numbers: Provisional Death Counts and COVID-19. . cdc . gov/nchs/data/nvss/coronavirus/Understanding-COVID-19-Provisional-Death-Counts . pdf , 2020. Accessed: 2020-10-06.[9] Matt J Keeling and Pejman Rohani. Modeling infectious diseases in humans and animals . Princeton University Press,2011.[10] Lucas B¨ottcher, Mingtao Xia, and Tom Chou. Why case fatality ratios can be misleading: individual-and population-basedmortality estimates and factors inﬂuencing them.

Physical Biology , 17:065003, 2020.[11] Lucas B¨ottcher, Jan Nagler, and Hans J Herrmann. Critical behaviors in contagion dynamics.

Physical Review Letters ,118(8):088301, 2017.[12] Lucas B¨ottcher and Nino Antulov-Fantulin. Unifying continuous, discrete, and hybrid susceptible-infected-recovered pro- cesses on networks. Physical Review Research , 2(3):033121, 2020.[13] Romualdo Pastor-Satorras, Claudio Castellano, Piet Van Mieghem, and Alessandro Vespignani. Epidemic processes incomplex networks.

Reviews of Modern Physics , 87(3):925, 2015.[14] Jeremy Samuel Faust, Zhenqiu Lin, and Carlos Del Rio. Comparison of estimated excess deaths in new york city duringthe COVID-19 and 1918 inﬂuenza pandemics.

JAMA Network Open , 3(8):e2017527–e2017527, 2020.[15] Steven H Woolf, Derek A Chapman, Roy T Sabo, Daniel M Weinberger, and Latoya Hill. Excess deaths from COVID-19and other causes, March-April 2020.

Jama , 324(5):510–513, 2020.[16] Vasilis Kontis, James E Bennett, Theo Rashid, Robbie M Parks, Jonathan Pearson-Stuttard, Michel Guillot, Perviz Asaria,Bin Zhou, Marco Battaglini, Gianni Corsetti, et al. Magnitude, demographics and dynamics of the eﬀect of the ﬁrst waveof the COVID-19 pandemic on all-cause mortality in 21 industrialized countries.

Nature Medicine , pages 1–10, 2020.[17] GitHub repository. https://github . com/lubo93/disease-testing , 2020.[18] World Health Organization. Estimating mortality from COVID-19. Department of Communications, Global InfectiousHazard Preparedness, WHO Global , 2020.[19] Zhe Xu, Lei Shi, Yijin Wang, Jiyuan Zhang, Lei Huang, Chao Zhang, Shuhong Liu, Peng Zhao, Hongxia Liu, Li Zhu, et al.Pathological ﬁndings of COVID-19 associated with acute respiratory distress syndrome.

Lancet Resp. Med. , 2020.[20] Ensheng Dong, Hongru Du, and Lauren Gardner. An interactive web-based dashboard to track COVID-19 in real time.

The Lancet Infectious Diseases , 2020.[21] CDC. Pneumonia and Inﬂuenza Mortality Surveillance from the National Center for Health Statistics Mortality SurveillanceSystem. https://gis . cdc . gov/grasp/fluview/mortality . html , 2020. Accessed: 2020-12-10.[22] MoMo Spain. MoMo Spain. https://momo . isciii . es/public/momo/dashboard/momo dashboard . html , 2020. Ac-cessed: 2020-12-10.[23] Oﬃce for National Statistics. Deaths registered weekly in England and Wales, provisional. . ons . gov . uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/datasets/weeklyprovisionalfiguresondeathsregisteredinenglandandwales , 2020. Accessed: 2020-12-10.[24] Bundesamt f¨ur Statistik. Sterblichkeit, Todesursachen. . bfs . admin . ch/bfs/de/home/statistiken/gesundheit/gesundheitszustand/sterblichkeit-todesursachen . html , 2020. Accessed: 2020-07-16.[25] Istituto Nazionale di Statistica. Dati di mortalit`a: cosa produce l’Istat. . istat . it/it/archivio/240401 , 2020.Accessed: 2020-07-08.[26] Epidemic Datathon. Summary of historical and current mortality data. . epidemicdatathon . com/data , 2020.Accessed: 2020-12-10.[27] The Economist’s tracker for COVID-19 excess deaths. https://github . com/TheEconomist/covid-19-excess-deaths-tracker , 2020. Accessed: 2020-12-10.[28] EURO MOMO. Mortality monitoring in Europe. . euromomo . eu/index . html , 2020. Accessed: 2020-12-10.[29] Excess Deaths Associated with COVID-19. . cdc . gov/nchs/nvss/vsrr/covid19/excess deaths . htm , 2020.Accessed: 2020-12-10.[30] Ruiyun Li, Sen Pei, Bin Chen, Yimeng Song, Tao Zhang, Wan Yang, and Jeﬀrey Shaman. Substantial undocumentedinfection facilitates the rapid dissemination of novel coronavirus (SARS-CoV2). Science , 2020. ISSN 0036-8075.[31] Carson C Chow, Joshua C Chang, Richard C Gerkin, and Shashaank Vattikuti. Global prediction of unreported sars-cov2infection from observed covid-19 cases. medRxiv , 2020.[32] Henrik Salje, C´ecile Tran Kiem, No´emie Lefrancq, No´emie Courtejoie, Paolo Bosetti, Juliette Paireau, Alessio Andronico,Nathana¨el Hoz´e, Jehanne Richet, Claire-Lise Dubost, et al. Estimating the burden of SARS-CoV-2 in France.

Science ,2020.[33] John Ioannidis. The infection fatality rate of COVID-19 inferred from seroprevalence data. medRxiv , 2020.[34] NYC Health. COVID-19: Data. . nyc . gov/site/doh/covid/covid-19-data . page , 2020. Accessed: 2020-12-10.[35] Yi Liu. COVID-19 Coronavirus Map repository. https://github . com/stevenliuyi/covid19 , 2020. Accessed: 2020-04-20.[36] New York Times. Coronavirus (Covid-19) Data in the United States. https://github . com/nytimes/covid-19-data , 2020.Accessed: 2020-12-10.[37] Eun Sul Lee and Ronald N Forthofer. Analyzing complex survey data . SAGE, 2006.[38] CDC. Guidance for Certifying Deaths Due to Coronavirus Disease 2019 (COVID–19). . cdc . gov/nchs/data/nvss/vsrg/vsrg03-508 . pdf , 2020. Accessed: 2020-11-20.[39] Jessica Watson, Penny F Whiting, and John E Brush. Interpreting a COVID-19 test result. BMJ , 369, 2020.[40] Yicheng Fang, Huangqi Zhang, Jicheng Xie, Minjie Lin, Lingjun Ying, Peipei Pang, and Wenbin Ji. Sensitivity of chestCT for COVID-19: comparison to RT-PCR.

Radiology , page 200432, 2020.[41] Wenling Wang, Yanli Xu, Ruqin Gao, Roujian Lu, Kai Han, Guizhen Wu, and Wenjie Tan. Detection of SARS-CoV-2 indiﬀerent types of clinical specimens.

JAMA , 323(18):1843–1844, 2020.[42] COVIDView. . cdc . gov/coronavirus/2019-ncov/covid-data/covidview/index . html , 2020. Accessed: 2020-12-10.[43] Census Bureau Estimates U.S. Population Reached 330 Million Today. https://data . cdc . gov/NCHS/Conditions-contributing-to-deaths-involving-corona/hk9y-quqm , 2020. Accessed: 2020-09-26.[44] Ali Horta c csu, Jiarui Liu, and Timothy Schwieg. Estimating the fraction of unreported infections in epidemics with aknown epicenter: an application to COVID-19. Technical report, National Bureau of Economic Research, 2020.[45] Pascale Ondoa, Yenew Kebede, Marguerite Massinga Loembe, Jinal N Bhiman, Sofonias Kiﬂe Tessema, AbdourahmaneSow, John Nkengasong, et al. COVID-19 testing in Africa: lessons learnt. The Lancet Microbe , 1(3):e103–e104, 2020. [46] CDC. International Classiﬁcation of Diseases,Tenth Revision (ICD-10). . cdc . gov/nchs/icd/icd10 . htm , 2020.Accessed: 2020-04-19.[47] EUROSTAT. Death due to transport accidents, by sex. https://ec . europa . eu/eurostat/databrowser/view/tps00165/default/table?lang=en QJM: An International Journal of Medicine medRxiv , 2020.[52] Ria Lassauniere, Anders Frische, Zitta B Harboe, Alex CY Nielsen, Anders Fomsgaard, Karen A Krogfelt, and Charlotte SJorgensen. Evaluation of nine commercial SARS-CoV-2 immunoassays. medRxiv , 2020.[53] Jeﬀrey D Whitman, Joseph Hiatt, Cody T Mowery, Brian R Shy, Ruby Yu, Tori N Yamamoto, Ujjwal Rathore, Gregory MGoldgof, Caroline Whitty, Jonathan M Woo, et al. Test performance evaluation of SARS-CoV-2 serological assays. medRxiv ,2020.[54] Mayara Lisboa Bastos, Gamuchirai Tavaziva, Syed Kunal Abidi, Jonathon R Campbell, Louis-Patrick Haraoui, James CJohnston, Zhiyi Lan, Stephanie Law, Emily MacLean, Anete Trajman, et al. Diagnostic accuracy of serological tests forCOVID-19: systematic review and meta-analysis.

BMJ , 370, 2020.[55] Ingrid Arevalo-Rodriguez, Diana Buitrago-Garcia, Daniel Simancas-Racines, Paula Zambrano-Achig, Rosa del Campo,Agustin Ciapponi, Omar Sued, Laura Martinez-Garcia, Anne Rutjes, Nicola Low, et al. False-negative results of initialRT-PCR assays for COVID-19: a systematic review. medRxiv , 2020. Supplementary InformationExamples of excess death data

FIG. A1:

Mortality evolution in diﬀerent countries . The evolution of weekly deaths in New York City, Spain, Eng-land/Wales, and Switzerland for diﬀerent age classes (where available). Grey solid lines and shaded regions represent thehistorical mean numbers of deaths and corresponding conﬁdence intervals. Blue solid lines indicate weekly deaths and weeklydeaths that lie outside the conﬁdence intervals are indicated by solid red lines. For England/Wales and Switzerland, weeklymeans and 95% conﬁdence intervals are based on data from 2015–2019. In the case of Spain, we show the reported COVID-19deaths across all age classes [35] in the inset and use the 99% conﬁdence intervals that are directly provided in the correspondingdata [26]. The red shaded regions represent the mean cumulative excess deaths D e . The data are derived from Refs. [21–25]. We tally weekly deaths according to Eq. (1) for each week i starting from the ﬁrst week of 2020, and cumulativeexcess deaths as in Eq. (2) adding all weekly contributions from the ﬁrst week of 2020 onwards. Note that somegovernmental agencies tabulate weekly deaths starting on the Sunday closest to January 1 2020 (December 29 2019,such as the United States), others instead use January 1 2020 as the ﬁrst day of the week (such as Germany). A detailedlist of how each country bins weekly deaths is included in Ref. [27]. The ﬁnal week k up to which the cumulative countis taken depends on data availability, since some countries have larger reporting delays than others. In the majority6 FIG. A2:

Weekly and cumulative death rates in diﬀerent countries and regions . We compare the evolution ofconﬁrmed weekly deaths d (0)c ( i ) (dashed black curves) and cumulative deaths D c ( k ) (dashed dark red curves) with weeklyexcess deaths ¯ d e ( i ) (solid grey curves) and cumulative excess deaths ¯ D e ( k ) (solid red curves). The deaths are plotted in units ofper 100,000 in diﬀerent countries and regions. The data are derived from Ref. [27] and the error bars for the excess deaths arederived from Eqs. (1) and (2). For Spain, we used the 99% conﬁdence intervals that are directly provided in the correspondingdata [26] to approximate the 95% conﬁdence intervals. Typically, we ﬁnd ¯ D e ( k ) > D c ( k ). of cases k is beyond the fourth week of November 2020. Quantities are calculated from data that include deaths fromtypically J = 5 previous years [27]. In Fig. A2 we plot the weekly conﬁrmed deaths d (0)c ( i ), the cumulative deaths D c ( k ) = (cid:80) ki =1 d (0)c ( i ), and the mean weekly and cumulative excess deaths ¯ d e ( i ) for 2020 as available from data. We7also show ¯ D e ( k ) per 100,000 persons from the start of 2020 using Eqs. (1) and (2). The corresponding error barsin Fig. A2 indicate 95% conﬁdence intervals deﬁned by ¯ d e ( i ) ± . σ e ( i ) and ¯ D e ( k ) ± .

96 Σ e ( k ) in Eqs. (1) and (2),respectively. For Spain, we used the 99% conﬁdence intervals that are directly provided in the corresponding data [26]to approximate the 95% conﬁdence intervals. Excess death statistics evolve diﬀerently across diﬀerent countries andregions. For example, in France excess deaths were negative until the end of March 2020, quickly increasing in April2020. In Ecuador and Peru, the number of excess deaths is more than 2.5 times larger than the corresponding numberof conﬁrmed COVID-19 deaths. Statistical testing model

Given biases in sampling and testing errors, it is important to use a statistical testing model that takes them intoaccount when estimating the fraction f of a population N that are infected. Testing biases arises, for example, ifsymptomatic individuals are more likely to seek testing. Thus, the probability f b that an individual who chooses tobe tested is positive may be diﬀerent from f the probability that a randomly selected individual is positive, as deﬁnedin Eq. (3). If all tests are error-free, the probability that Q + positive results arise from the Q ≥ Q + administeredtests is given by P true ( Q + | Q, f b ) = (cid:18) QQ + (cid:19) f Q + b (1 − f b ) Q − Q + . (A1)Eq. (A1) is derived under the assumption that once individuals are tested, they are “replaced” in the populationand can be tested again. The analogous distribution P true ( Q + | Q, f b ) for testing “without replacement” can bestraightforwardly derived and yields results quantitatively close to Eq. (A1) provided Q/N (cid:46) . ≈ . − .

07 and FNR ≈ . ≈ . ≈ .

54 at initial testing [55], underlyingthe need for retesting. Reported percentages of false positives in RT-PCR tests are about FPR ≈ .

05 [39]. A largemeta-analysis of serological tests estimates FPR ≈ .

02 and FNR ≈ . − .

16 [54]. These testing errors can lead toinaccurate estimates of disease prevalence; uncertainty in FPR, FNR will thus lead to uncertainty in the estimate ofprevalence.As illustrated through Fig. 2, errors in testing may result in the recorded number ˜ Q + of positive tests to be diﬀerentfrom the Q + that would be obtained under perfect testing. The probability that ˜ Q + positive tests are returned dueto testing errors can be described in terms of Q + , FPR, and FNR and the corresponding probability distribution P err ( ˜ Q + | Q + , FPR , FNR) is given by P err ( ˜ Q + | Q + , FPR , FNR) = ˜ Q + (cid:88) p + =0 (cid:18) Q + p + (cid:19) (1 − FNR) p + (FNR) Q + − p + (cid:18) Q − q + (cid:19) (FPR) q + (1 − FPR) Q − − q + . (A2)where q + ≡ ˜ Q + − p + . By convolving P err ( ˜ Q + | Q + , FPR , FNR) with P true ( Q + | Q, f b ) we derive the overall like-lihood distribution for the measured number ˜ Q + of true and false positives given a set of speciﬁed parameters θ = { Q, f, b,

FPR , FNR } describing the population and testing P ( ˜ Q + | Q, f, b,

FPR , FNR) = Q (cid:88) Q + =0 P err ( ˜ Q + | Q + , FPR , FNR) P true ( Q + | Q, f b ( f, b )) . (A3)When Q + , ˜ Q + , and Q (cid:29)

1, we can approximate P true , P err , and P by normal distributions and rewrite P as a functionof the observed positive fraction ˜ f b ≡ ˜ Q + /Q (Eqs. (4) and (5)).Using Bayes’ rule, we can then formally deﬁne the likelihood of θ given a measured ˜ f b , P ( θ | ˜ f b , α ) = P ( ˜ f b | θ ) P ( θ | α ) (cid:80) θ P ( ˜ f b | θ ) P ( θ | α ) , (A4)8where α = { ¯ θ, σ θ } are hyperparameters deﬁning the prior P ( θ | α ), such as their means ¯ θ = { ¯ D e , FPR , FNR , ¯ b, ¯ N } andstandard deviations σ θ = { Σ e , σ I , σ II , b, σ b , σ N } . Formally, the probability of measuring a value of a mortality measure Z = CFR , IFR , M, M , or r , can be computed from P ( Z | α ) = (cid:90) P ( Z | θ ) P ( θ | α )d θ, (A5)where P ( Z | θ ) deﬁnes the statistical model of the mortality measure given the components and parameters θ andthe hyperparameters α deﬁning the distribution over θ . For example, if Z is the value of the IFR, θ = { D e , f, N } and α = { ( ¯ D e , Σ e ) , (¯ b, FPR , FNR , σ b , σ I , σ II ) , ( ¯ N , Σ N ) } are the mean and standard deviation of excess deaths, testingparameters, and the total population, respectively.A simpler way to incorporate uncertainty in the infected fraction f is to assume a Gaussian approximation for alldistributions and propagate the uncertainty in testing parameters. The squared coeﬃcient of variation CV f is thendecomposed into the parameter variances according to σ f ˆ f ≈ (1 − (1 − e b ) ˆ f ) X Q ˜ f b (1 − ˜ f b ) + (1 − ˆ f ) X σ + e b ˆ f X σ + ˆ f (1 − ˆ f ) X σ b , (A6)where X ≡ ˜ f b − FPR. The values of b , FPR, FNR above are mean or maximum likelihood estimates of the bias andtesting errors, and σ b , σ , and σ are their associated uncertainties. The means and variances { ¯ b, FPR , FNR , σ b , σ , σ } represent hyperparameters associated with testing (see SI). Our result for σ f in Eq. (A6) assumes { b, FPR , FNR } areuncorrelated. Since Q (cid:29) f b (1 − ˜ f b ) /Q to be negligible. Uncertainties in other quantities willultimately contribute to uncertainty in the mortalities Z , as listed in Table II. Modeling of resolved mortality

In Fig. A3, we show the evolution of M for Spain and Lombardia, using diﬀerent eﬀective recovery rates ofunreported cases γ . We compute M according to Eq. (8) and use excess mortality data of Fig. A1 to determine ¯ D e .The corresponding data for conﬁrmed recovered and deceased individuals, R c and D c , is taken from Ref. [26]. Currentestimates of the IFR are 0 . − .

5% [31–33]. To obtain a value of M in a similar range, we vary γ from 1 − M ≈ . −

1% is consistent with γ = 100 − FIG. A3:

Evolution of resolved mortality.

We show the evolution of M ( t ) for diﬀerent values of eﬀective recovery rates ofunreported cases γγ