Estimation of the actual disease occurrence based on official case numbers during a COVID outbreak in Germany 2020
EEstimation of the actual disease occurrence based onofficial case numbers during a COVID outbreak in Germany2020
Ralph Brinks , Annika Hoyer (1) Department of Statistics, Ludwig-Maximilians-University Munich, Ludwigstr. 33, 80539Munich, Germany(2) Policlinics, Department and Hiller Research Unit for Rheumatology, Unversity HospitalDuesseldorf, Moorenstr. 5, 40225 Duesseldorf, Germany(3) Institute for Biometry and Epidemiology, German Diabetes Center, Auf'm Hennekamp 65,40225 Duesseldorf, Germany Summary
Since the beginning of March 2020, the cumulative numbers of cases of infection with thenovel coronavirus SARS-CoV-2 in Germany have been reported on a daily basis. The reportsoriginate from national laws, according to which positive test findings must be submitted tothe Federal Health Authorities, the Robert Koch Institute, via the local health authorities.Since an enormous number of unreported cases can be expected, the question of howwidespread the disease has been in the population cannot be answered based on theseadministrative reports. Using mathematical modeling, however, estimates can be made. Theseestimates indicate that the small numbers of diagnostic tests carried out at the beginning of theoutbreak overlooked considerable parts of the infection. In order to cover the initial phase offuture waves of the disease, wide-spread and comprehensive tests are recommended.
Introduction
During the COVID outbreak in Germany in spring 2020, the cumulative number of cases ofinfected people was reported daily starting on March 4. The numbers of cases were based onthe positive results reported according to the Infection Protection Act(“Infektionsschutzgesetz”, IfSG) of the diagnostic tests carried out throughout Germany forthe novel coranavirus SARS-CoV-2. The number of tests carried out was based on regionalcharacteristics, availability of test kits and laboratory capacities. The frequency of the testscarried out increased during the epidemic [RKI 2020]. Together with the fact that severalpeople with the associated COVID disease show only asymptomatic or mild courses andpresumably have not been tested at any time, the variable frequency of the tests leads to thehypothesis that the actual disease process was at least partially underestimated.Based on the administrative case numbers, in this work we provide a lower bound for the timecourse of the numbers of actually infected people. For this, we use the effective reproductionnumber ( R eff ). In contrast to the actual number of cases, an estimate of R eff is possible if thecases are only reported incompletely [an der Heiden 2020]. Our resulting estimate of thelower bound of the numbers of infected people allows to further estimate an upper bound forthe proportion of cases that were recorded in the administrative reports according to the IfSG.This proportion is referred to as the case detection ratio (CDR) [Borgdorf 2004]. ethods We start from the frequently used epidemiological model that Kermack and McKendrickdescribed in their seminal work from 1927 [Kermack 1927]. Since the model consists of thethree disease-relevant states 'Susceptible', 'Infectious' and 'Removed', i.e., recovered or dead,it is often called the SIR model [Vynnycky 2010]. The following ordinary differentialequation applies to the number I of infectious persons at time t eff RIrtI , (1)where R eff denotes the effective reproduction number. The effective reproduction number isthe average number of secondary infectious cases that one primary case infects. The quantity r in Equation (1) describes the rate at which people leave the stage of being infectious. Amathematical rationale for Equation (1) is given in the supplement to this work. Given theinitial condition I (0) = I at the starting time t = 0 of the official reports on March 4, thenumber of infectious persons at time t can be derived from Equation (1). It holds: t eff RrItI d1)(exp)( . (2)As shown in the appendix to this work, the SIR model for the number F t of the newlyoccurring infectious cases (incident cases) in the period from t to t + leads to: )()()()( tIrtRtStSF efft . (3)Since the administrative case numbers were reported on a daily basis, we henceforth assume = 1 day.Equations (1) to (3) apply to all infectious diseases that can appropriately be described usingthe SIR disease model [Vynnycky 2004, Chowell 2009]. These equations are independent ofscreening tests or any diagnostic procedures. By carrying out tests, the presence of thepathogens that underlie the disease becomes visible.In the case of the novel coronavirus SARS-CoV-2, new infectious cases are reported daily viaadministrative channels. If in a reported case the date of onset of the disease has not beencommunicated, these new cases are assigned to a specific day using a statistical method [ander Heiden 2020]. Slight smoothing using a spline is used to smooth the number of reportedincident cases.Let the number of newly incident cases assigned to a day t be denoted by F t(b) , we canconsider the proportion of the reported cases in relation to the actual (incidental) new cases F t according to Equation (3). This leads to the case- detection ratio (CDR) [Borgdorf 2004]: tbtt FFCDR )( . (4)In the case of a complete detection of all newly occurring infectious cases on one day t , the CDR t would equal 100%. Since in the early days of the outbreak of the SARS-CoV-2pidemic only a few tests were carried out in Germany, it can be assumed that the CDR waslow in early March.Based on the reported reproduction number R eff and r = 0.1 per day [an der Heiden 2020], weuse Equations (2) and (3) to calculate a lower bound for the number I of infectious cases attime t = 0 we use that the CDR is at most 100% at any time. This means that the observed newcases F t(b) cannot exceed the number of new cases F t . For all times t we have: t effefftttbt RrIrtRFFCDRF d1)(exp)( . (5)The only unknown quantity on the right side of (5) is the number I of the persons infected atthe time t = 0. As soon as we have a lower bound for I and thus a lower bound for the number F t of the actual new cases via (5), we can determine an upper bound for the CDR usingequation (4). As an illustration, we compare our estimate for the time course of the
CDR withthe number of positive tests performed.
Results
Figure 1 shows the corrected new cases F t(b) in the administrative reports of the Robert KochInstitute for the reporting period from March 4 to April 8 [an der Heiden 2020]. In addition,the smoothed curve is drawn as a black line.The blue dashed curve in Figure 2 is a lower bound of the new cases F t based on the estimate(5). A lower bound of 7990 people is obtained for I .If we now calculate the ratio F t(b) / F t for the determination of the upper bound of the CDR, weobtain the curve as depicted in Figure 3. This shows in particular that in the beginning of theepidemic in Germany the CDR was quite low with at most 30%.In Figure 4, the quotient F t(b) / F t is compared to the number of positive results (number oftests performed multiplied by the positive rate) as red curve [RKI 2020]. Qualitatively, thereis a similar course over time, which we see as an indication of our claim that the test behaviorhas changed over the reporting period. Figure 1: Administrative numbers F t(b) of reported incident cases of the SARS-CoV-2 infections inGermany (dots). The smoothed values are shown as solid line. Figure 2: Lower bound of the actual (true) numbers of incident cases F t (blue dashed line) in comparisonwith the reported numbers F t(b) (black solid line). CDR ( % ) Figure 1: Estimated upper bound of the case-detection-ratio (CDR in %) in Germany.
CDR ( % ) Figure 4: Estimated upper bound of the CDR compared to the number of positive tests (red line). iscussion
It is important to run screening and diagnostic tests to track the course of an epidemicoutbreak and to assess possible measures to control the epidemic. In the extreme case that notests are carried out, the epidemic goes widely unnoticed.In this work we use the effective reproductive number R eff estimated over the generation timeto estimate the actual occurrence of infections with the pathogen SARS-CoV-2 in the Germanpopulation as a whole. A comparison of the reported cases ( F t(b) ) with the expected cases ( F t )gives an upper limit of the case detection ratio ( CDR ). As was shown on the basis of the dailyreported case numbers, the initial phase of the SARS-CoV-2 outbreak in Germany remainedlargely undetected until around March 15.At the start of reporting on March 4, a cumulative number of 262 cases was reported. Ourestimate here shows that there were already around 8,000 infected people in Germany at thispoint in time. The phase of inadequate testing also includes the peak of infection aroundMarch 9, at which R eff reached the tentative maximum of 3.3. In order to reliably enable futuresurveillance of SARS-CoV-2, we need a close-knit recording with sufficient test capacities. References an der Heiden M, Hamouda O: Schätzung der aktuellen Entwicklung der SARS-CoV-2-Epidemie in Deutschland – Nowcasting,
Epid Bull
Emer Infect Dis :10(9), 1523, 2004Chowell G, Hyman JM, Bettencourt LM, Castillo-Chavez C. (Eds.). Mathematical andstatistical estimation approaches in epidemiology,
Springer , London 2010Kermack WO, McKendrick AG, A contribution to the mathematical theory of epidemics,
Proc. R. Soc. Lond. A
Epid Bull
Oxford UniversityPress , Oxford 2010
Supplementary information