A study on the possible merits of using symptomatic cases to trace the development of the COVID-19 pandemic
Gianluca Bonifazi, Luca Lista, Dario Menasce, Mauro Mezzetto, Daniele Pedrini, Roberto Spighi, Antonio Zoccoli
EEPJ manuscript No. (will be inserted by the editor)
A study on the possible merits of using symptomatic cases totrace the development of the COVID-19 pandemic
Gianluca Bonifazi , , Luca Lista , , Dario Menasce , Mauro Mezzetto a , , Daniele Pedrini , Roberto Spighi , andAntonio Zoccoli , Universit`a Politecnica delle Marche INFN Sezione di Bologna Universit`a degli Studi di Napoli Federico II INFN Sezione di Napoli INFN Sezione di Milano Bicocca INFN Sezione di Padova Alma Mater Studiorum Universit`a di BolognaReceived: date / Revised version: date
Abstract.
In a recent work [1] we introduced a novel method to compute R t and we applied it to describethe development of the COVID-19 outbreak in Italy. The study is based on the number of daily positiveswabs as reported by the Italian Dipartimento di Protezione Civile. Recently, the Italian Istituto Superioredi Sanit`a made available the data relative of the symptomatic cases, where the reporting date is the dateof beginning of symptoms instead of the date of the reporting of the positive swab. In this paper we willdiscuss merits and drawbacks of this data, quantitatively comparing the quality of the pandemic indicatorscomputed with the two samples. The worldwide data about the development of the COVID-19 outbreak is always reported as daily number of positiveswabs. This quantity suffers from several problems, since it can be biased by different strategies and response timeof swab data taking in different regions and different periods of time. It’s affected by strong weekend effects in therecording of the data, due to reduced capacity of processing swabs on Saturdays and Sundays, furthermore the reportingof a positive swab introduces a further delay between the dates of contagion and those of appearance of symptoms (ifany).Potentially, the reporting of symptomatic cases, together with the date of symptom onset, could attenuate mostof these issues. In principle, symptomatic cases should suffer less from different strategies of swab data taking, beingthe most urgent cases to be treated, and the date of symptom onset should be less influenced by weekend effects andshould not be affected by additional delays introduced by the processing and reporting of a molecular swab.On the other hand, the sample of symptomatic cases is a subset of the total number of cases, while the size ofthe sample is an issue for relatively small populations, like Italian regions or provinces. Furthermore, a bias could beintroduced if the true fraction of asymptomatic cases changes during the pandemic because of a modification of theage distribution of infected people.From December 6 th . The published data containsthe history of all the symptomatic cases on a national basis, while for regions and provinces the daily data are onlyreported.Data about positive swabs are instead published, since the beginning of the outbreak, by the Italian Dipartimentodi Protezione Civile (DPC)[3]. It becomes then possible to compare the information that can be extracted from thefull sample of positive swabs with the one from the sub-sample of symptomatic cases. a Corresponding author, e-mail: [email protected] . It should be noted that molecular swabs initiated on February 24, 2020 and the reported symptomatic cases reported beforethis date refer to those positive swabs. For this reason the symptomatic cases reported from January 28 to February 24 are anincomplete sample and we don’t consider them in the following a r X i v : . [ phy s i c s . s o c - ph ] J a n In the following, we will work out several indicators to compare the merits and the differences of the two samples.
We show in Fig. 1 the daily data of the symptomatic cases and positive swab samples. We perform a fit to the datawith the sum of three Gompertz functions, g ( t ; a, b, c ): g ( t ; a, b, c ) = a · exp ( − b · t ) exp ( − c · exp ( − b · t )) (1)in order to take care of the first phase of the outbreak in the period March-April, the increase of August and the thirdphase in October-December, as reported in the same Fig. 1. Fig. 1.
Distribution of the symptomatic cases (light blue) and of the positive swabs (orange) as reported by ISS and DPCrespectively. Poisson errors are drawn but are of the same size of the bullets. The continuous lines are the fits to these data asdescribed in the text. The inset displays some parameters of the fitted curves, days are computed since January 28 th , 2020. The positions of the first and the second peak are at days 49.5 and 279.8 respectively for the symptomatic casesand at days 61.0 and 288.7 respectively for the positive swabs sample, dates are counted since January 28 th , 2020.The fit errors of the peak positions, considering only diagonal terms in the covariance matrix, are of the order of 0.1days. We conclude that the positive swab sample is delayed by about 9 days, which can be considered the averagedelay between the appearance of the symptoms and the reporting of a positive swab. This number takes into accountthat asymptomatic cases are mostly detected by a tracing of the symptomatic cases (and they are therefore delayed)and not by a screening of the population. The fact that the delay at the first peak was bigger by 2.6 days with respectto the delay at the second peak could be understood as better procedures for swab processing developed along theoutbreak. We can quantify the level of fluctuations present in the two samples by computing the residuals of the curves. We defineas residual the difference of the fitted point with the data point, divided by the Poisson error of the data point. Wouldthe Poisson error be the only source of errors, we should find a distribution of residuals with a standard deviation of1. Limiting the analysis to the first peak, where the size of the two samples is similar, we obtain a standard deviationof the residual distributions equal to 8.2 for the symptomatic cases and of 9.3 for the positive swabs. A contributionto these high values can be associated to a non-perfect parameterization of the data or to an underestimation of thequoted errors that does not take into account the systematic contribution. We can anyway conclude that the sampleof symptomatic cases does not significantly reduce the dispersion of the data around the central values with respectto the positive swab sample. ur. Phys. J. Plus
Another quantity that could be influenced by additional fluctuations present in the positive swabs sample is the widthof the peaks. If, for instance, the delay between the date of appearance of symptoms and the date of reporting ofa positive swab would follow a broad distribution, this could affect the width of the fitted peaks. We compute theFWHM of the peaks as fitted with the three Gompertz curves. We found a FWHM of 35 and 41 days at the firstpeak and 41 and 45 days at the second peak for the symptomatic and positive swab samples respectively. We considerthese values as an indication of a significant contribution of the dispersion of swab reporting times to the distributionof positive cases. If we attribute the increase of FWHM of the second peak entirely to this effect, the reporting timesshould have a standard deviation of about 8 days.
A side result of these comparisons is the distribution of the fraction of asymptomatic cases in the positive swabs samplealong the outbreak. If we anticipate by 9 days the positive swabs distribution, according to the above discussion, wecan compute day by day the difference of the two data and from this the fraction of asymptomatic cases in the positiveswabs sample. The result is displayed in Fig: 2.
Fig. 2.
Fraction of asymptomatic cases in the positive swab sample. Dates are adjusted following those of the symptomaticcases, as discussed in the text.
We observe that at the beginning of the pandemic the fraction of asymptomatic cases went to almost zero at thepeak of the pandemics. It then grew to about 0.6 at the end of first peak, and remained stable until the second peakwas reached, when the total number of swabs was probably insufficient to guarantee a proper tracing.
We demonstrated in [1] that the growth rate λ = 1 /t , where t is the doubling time of an exponential fit to the datain the last “ n ” days, is as good an indicator as R t for the description of the behaviour of the outbreak. We representthe distribution of λ for the symptomatic and positive swabs samples. λ is computed via an exponential fit to the last14 days and we display its moving average along 14 days. Results are displayed in Fig. 3. The two curves have almostthe identical behaviour with the characteristic delay of the positive swabs curve. Fig. 3.
Growth rate λ of the exponential fit to reported cases in the last 14 days for the symptomatic sample (blue) and thepositive swabs sample (red) Fig. 4. R t computed for the symptomatic sample (blue) and the positive swabs sample (red)ur. Phys. J. Plus We display the same result in terms of the more familiar R t in Fig. 4, computed using the algorithm published in[1]. Again, the two curves are very similar, and in particular there is no evidence of underestimation of the value of R t by the curve.For the sake of completeness we repeat the same comparison with four of the most common algorithms used inliterature to evaluate R t : Wallinga and Teunis [4], Cori et al. [5], both computed thanks to the public package EpiEstim[6], Bettencourt-Ribeiro [7], computed following the indications of [8], and Robert Koch Institute (RKI) [9]. The plotsare reported in Fig. 5 and show the identical behaviour of the plot of Fig. 4. Fig. 5. R t computed for the symptomatic sample (blue) and the positive swabs sample (orange) with four different algorithms(see the text). From upper left, clockwise: Wallinga-Teunis, Cori et al., Bettencourt-Ribeiro and RKI. It should be noted that the fact that the symptomatic curve is anticipated by about 9 days, doesn’t mean that byusing this data one can identify in advance the trends of the outbreak. The reporting time of the dates of symptomsindeed follows the time of positive swabs, and a minimum delay of 14 days is suggested by ISS to collect all the datesof symptoms. This is coherent with our estimation of 9 days of delay with a standard deviation of 8 days. Consideringthis effect, the symptomatic cases sample is even slower, as a real-time estimator, than the positive swabs sample,needing 14 days for data collection and having an anticipation of 9 days.
We have compared the information that can be extracted about the development of the COVID-19 outbreak in Italyby using the daily new cases reported for the infected with symptoms and for the total sample of positive swabs.The symptomatics sample is a valuable control sample because under some aspects it suffers of less systematic effectsthan the positive swabs sample. We observe a modest reduction of the dispersion of the data with the sample ofsymptomatic cases and a better definition of the peaks in the distribution of the daily positive cases. The differencesbetween the two curves can be explained with a delay time, between the appearance of the symptoms and the date of the reported positive swab, of about 9 days with a standard deviation of about 8 days. With this correction, thetwo samples are comparable and the extracted R t is almost identical. Real-time evaluations of R t are faster and morerobust with the sample of the positive swabs.We conclude that the sample of the positive swabs can be safely used to monitor the development of the COVID-19outbreak.We publish daily estimates of R t in real time, together with many information about the development of the Italianoutbreak in [10]. Daily values for the major world countries are also reported. The present work has been done in the context of the INFN CovidStat project that produces an analysis of the publicItalian COVID-19 data. The results of the analysis are published and updated daily on the website covid19.infn.it/ .The project has been supported in various ways by a number of people from different INFN Units. In particular, we wishto thank, in alphabetic order: Stefano Antonelli (CNAF), Fabio Bredo (Padova Unit), Luca Carbone (Milano-BicoccaUnit), Francesca Cuicchio (Communication Office), Mauro Dinardo (Milano-Bicocca Unit), Paolo Dini (Milano-BicoccaUnit), Rosario Esposito (Naples Unit), Stefano Longo (CNAF), and Stefano Zani (CNAF). We also wish to thankProf. Domenico Ursino (Universit`a Politecnica delle Marche) for his supportive contribution.
References
1. G. Bonifazi et al. , A simplified estimate of the Effective Reproduction Number Rt using its relation with the doubling timeand application to Italian COVID-19 data
Dati COVID-19 Italia , https://github.com/pcm-dpc/COVID-194. J. Wallinga and P. Teunis,
Different Epidemic Curves for Severe Acute Respiratory Syndrome Reveal Similar Im-pacts of Control Measures , American Journal of Epidemiology, Volume 160, Issue 6, 15 September 2004, Pages 509–516.https://doi.org/10.1093/aje/kwh2555. A. Cori, N. M. Ferguson, C. Fraser and S. Cauchemez,
A New Framework and Software to Estimate Time-Varying Reproduc-tion Numbers During Epidemics , American Journal of Epidemiology, Volume 178, Issue 9, 1 November 2013, Pages 1505–1512,https://doi.org/10.1093/aje/kwt1336. EpiEstim:
Estimate Time Varying Reproduction Numbers from Epidemic Curves , https://cran.r-project.org/web/packages/EpiEstim/index.html7. L. M. A. Bettencourt and R. M. Ribeiro,
Real Time Bayesian Estimation of the Epidemic Potential of Emerging InfectiousDiseases , PLoS ONE, Volume 3, Issue 5, e2185, 2008, https://doi.org/10.1371/journal.pone.00021858. K. Systrom,