COVID-19 and the difficulty of inferring epidemiological parameters from clinical data
CCOVID-19 and the difficulty of inferring epidemiological parameters from clinical data
Simon N. Wood , Ernst C. Wit , Matteo Fasiolo and Peter J. Green . Accepted Lancet Infectious Diseases . Knowing the infection fatality ratio (IFR) is crucial for epidemic management: for immediate planning; forbalancing the life years saved against those lost to the consequences of management; and for evaluating the ethicsof paying substantially more to save a life year from the epidemic than from other diseases. Impressively, Verityet al. (2020) rapidly assembled case data and used statistical modelling to infer the IFR for COVID-19. We haveattempted an in-depth statistical review of their paper, eschewing statistical nit-picking, but attempting to identifythe extent to which the (necessarily compromised) data are more informative about the IFR than the modellingassumptions. First the data. • Individual level data for outside China appear problematic, because different countries have differing levelsof ascertainment and different disease-severity thresholds even for classification as a case. Their use inIFR estimation would require country-specific model ascertainment parameters, about which we have noinformation. Consequently these data provide no useful information on IFR. • Repatriation flight data provide the sole information on Wuhan prevalence (excepting the lower boundof confirmed cases). 689 foreign nationals eligible for repatriation are doubtfully representative of thesusceptible population of Wuhan. Hence it is hard to see how to usefully incorporate the 6 positive casesfrom this sample. • Case-mortality data from China provide an upper bound for IFR, and, with extra assumptions, on the agedependence of IFR. Since prevalence is unknown, they contain no information for estimating the absoluteIFR magnitude. • Because of extensive testing, the Diamond Princess (used only for validation by Verity et al.) supplies dataon both infections and symptomatic cases, with fewer ascertainment problems. These data appear directlyinformative about IFR. Against this, the co-morbidity load on the DP is unlikely to fully represent anypopulation of serious interest (perhaps fewer very severe but more milder co-morbidities).Secondly, the modelling assumptions: we see two primary problems.1. Verity et al. correct the Chinese case data by assuming that ascertainment differences across age groupsdetermine case rate differences. Outside Wuhan they replace observed case data by the cases that wouldhave occurred if each age group had the same per-capita observed case rate as the 50-59 group. Theyassume complete ascertainment for the 50-59s. These are very strong modelling assumptions that willgreatly affect the results: but the published uncertainty bounds reflect no uncertainty about them. In Wuhan,the complete ascertainment assumption is relaxed by introducing a parameter, but one for which the dataappear uninformative, so the results will be driven by the assumed uncertainty.2. Generically, Bayesian models describe uncertainty both in the data and in prior beliefs about the studiedsystem. Only when data are informative about the targets of modelling can we be sure that prior beliefs playa small role in what the model tells us about the world. In this case the data are especially uninformative:we suspect results are mostly the consequence of what our prior beliefs were.Taken together these problems indicate that Verity et al.s IFRs should be treated very cautiously when planning.While awaiting actual measurements, we would base IFRs on the DP data, with the Chinese case-fatality datainforming the dependence of IFR on age: in supplementary material we provide a crude Bayesian model withits IFR estimates by age. Corresponding population IFR estimates and 95% credible intervals are China: 0.43%(.23,.65), UK: 0.55% (.30,.82); India 0.20% (.11,.30). The strong assumptions required, by this approach too,emphasize the need for improved data. We should replace complex models of inadequate clinical data, withsimpler models of epidemiological prevalence data from appropriately designed random sampling using antibodyor PCR tests.
References
Verity, R., L. C. Okell, I. Dorigatti, P. Winskill, C. Whittaker, N. Imai, G. Cuomo-Dannenburg, H. Thompson, P. G.Walker, H. Fu, et al. (2020). Estimates of the severity of coronavirus disease 2019: a model-based analysis.
TheLancet Infectious Diseases . [email protected] University of Bristol, UK. Universit`a della Svizzera italiana, Switzerland. a r X i v : . [ q - b i o . Q M ] M a y upplementary material for COVID-19 and the difficulty of inferring epidemiologicalparameters from clinical data
Simon N. Wood , Ernst C. Wit , Matteo Fasiolo and Peter J. Green . We attempt to construct a model for the Diamond Princess (henceforth DP) data and aggregated datafrom China, with the intention that the DP data informs the absolute magnitude of the IFR while theChina data contributes to the estimation of relative IFR by age class. For the Diamond Princess we lumpthe 80-89 and 90+ age groups into an 80+ group to match the China data, noting that there are no deathsin the 90+ group. We obtained the age of death of the 12 cases from the Diamond Princess Wikipediapage, checking the news reports on which the information was based. One case has no age reportedexcept that he was an adult. Given that there was no mention of a young victim we have assumed that hewas 50 or over.We adopt the assumptions of Verity et al. (2020) of a constant attack rate with age, and that thereis perfect ascertainment in one age class, but assume that this is the 80+ age group for the DP. Theassumption seems more tenable for the DP population than for China, given that 4003 PCR tests wereadministered to the 3711 people on board, with the symptomatic and elderly tending to be tested first.However given that the tests were not administered weekly to all people not yet tested positive, fromthe start of the outbreak, and that the tests are not 100% reliable, the assumption is still unlikely to beperfect, which may bias results upwards. Unlike Verity et al. (2020) we do not correct the case data,but adopt a simple model for under-ascertainment by age, allowing some, but by no means all, of theuncertainty associated with this assumption to be reflected in the intervals reported below. We thenmodel a proportion of the potentially detectable cases as being symptomatic, making a second strongassumption that this rate is constant across age classes. This assumption is made because the data onlytell us that there were 314 symptomatic cases among 706 positive tests but not their ages, so we haveno information to further distinguish age specific under-ascertainment and age specific rates of beingasymptomatic. We then adopt a simple model for the probability of death with age (quadratic on thelogit scale).For the China data we necessarily use a different attack rate to the DP, but the same model as theDP to go from infected to symptomatic cases (on the basis that this reflects an intrinsic characteristic ofthe infectious disease). However we assume that only a proportion of symptomatic cases are detected (atleast relative to whatever threshold counted as symptomatic on the DP). Furthermore we are forced toadopt a modified ascertainment model for China, and correct for the difference between this and the DPascertainment model, within the sub-model for China. We assume the same death rates for symptomaticcases in China, but apply the Verity et al. (2020) correction for not-yet-occurred deaths, based on theirfitted Gamma model, treating this correction as fixed.
In detail, starting with the Diamond Princess, let α be the infection probability, constant for all ageclasses, p ci the probability of an infection to be detectable in age class i , p si the probability that a detectable [email protected] University of Bristol, UK. Universit`a della Svizzera italiana, Switzerland. p di the probability that a symptomatic case dies. p ci p si p di is the IFR for ageclass i . Let a i denote the lower age boundary of class i . The models are (i) for the detectability probability p ci = γ + (1 − γ ) e − ( a i − / exp( γ ) . Note the assumption that all cases in the oldest age class are ascertained; (ii) a constant symptomaticprobability model, p si = φ, and (iii) for the probability that a symptomatic case dies,logit ( p di ) = β + β ( a i −
40) + β ( a i − . For a case to be recorded on the DP, the person needed to be attacked by the virus, gotten ill and detectedat the right moment. In principle, this means that the number of cases in age class i is distributed as abinom ( p ci α, n i ) , where p ci α is the probability of gotten ill and detected, and n i is the number of peoplein age class i on the DP. However, as only 619 out of the 706 cases have their age recorded, we split thecases into C i ∼ binom ( p ci α, n i / and C + i ∼ binom ( p ci α, n i (1 − / where C i are the observed cases of known age and C + i are the additional cases, assumed to follow thesame age distribution, but not actually recorded by age. Binomial parameters are rounded appropriately.Letting S i denote the symptomatics among the cases in age group i , we have S i ∼ binom ( p si , C i + C + i ) . The deaths among the symptomatics of known age are distributed as D i ∼ binom ( p si , S i h i ) where h i is the probability of being of known age on death (this is treated as fixed at 1 for ages less than50, and 11/12 for 50+ given the one victim on the DP for which no age was recorded, except that hewas an adult). For the deaths of unknown age, D na , (there is one of these) among the symptomatics ofunknown age (an artificial category) S na = (cid:80) i (1 − h i ) S i , we have D na ∼ binom ( p na , S na ) where the probability of death is p na = (cid:80) i (1 − h i ) S i p di /S na . Finally the total number of symptomaticsis modelled as S t ∼ N ( (cid:80) i S i , ) , allowing some limited uncertainty in the symptomatic/asymptomaticclassification.The actual available data on the DP are S t , D na and { C i , n i , D i } i =0 . Moving on to the Chinese data, the assumption is that the patterns with age with respect to detection ( p ci ),to being symptomatic ( p si ) and to death ( p di ) are similar, but the attack rate ˜ α for China is different. Let N i be the population size in age class i and ˜ S i the symptomatics. Then ˜ S i ∼ binom ( ˜ αp ci p si N i ) . P deaths D en s i t y . . . . . Figure 1: Posterior predictive distribution of the number of deaths on the Diamond Princess. The verticalred line is the actual number of deaths.Unlike on the DP, only a fraction δ i of the symptomatics are tested to become cases, ˜ C i ∼ binom ( δ i , ˜ S i ) and the (observed) deaths are then distributed as ˜ D i ∼ binom ( p di p yi /δ i , ˜ C i ) where p yi is the average probability of a case in class i having died yet, given they will die — this wastreated as a fixed correction and is computed from the Verity et al. (2020) estimated Gamma modelof time from onset to death, and the known onset times for the cases. The scaling by δ i ensures that p di maintains the same meaning between DP and China. We model δ i as δ i = δp cci /p ci where p cci is anattempt to capture the shape of the actual China detectability with age and is defined as p cci = exp {− ( a i − /e γ c } . We define the following priors using precision and not variance when defining normal densities: α ∼ U ( . , . , γ ∼ U ( . , . , γ ∼ N (7 . , . ,φ ∼ U ( . , . , β ∼ N ( − . , . , β ∼ N (0 , . , β ∼ N (0 , . , ˜ α ∼ U (10 − , . , δ ∼ U ( . , . , γ c ∼ N (7 . , . . This structure uses the information from the DP to assess the symptomatic rate and hidden caserate and the scale of the death probabilities, while the China data refines the information on how deathrates change with age. It is possible to formulate a model in which the China data appear to contributeto inference about absolute levels of mortality, but this model is completely driven by the prior put onproportion of cases observed (about which the China data are completely uninformative).The model was implemented in JAGS 4.3.0. Mixing is slow, but × steps, retaining every 2500thsample, gives an effective sample size of about 660 for δ , the slowest moving parameter. We discardedthe first 2000 retained samples as burn in, although diagnostic plots show no sign that this is necessary.4roup median IFR 95% IntervalOverall China 0.43 (0.23,0.65)Overall UK 0.55 (0.30,0.82)Overall India 0.20 (0.11,0.30)0-9 0.0007 (0.0002,0.002)10-19 0.003 (0.001,0.006)20-29 0.01 (0.005,0.02)30-39 0.03 (0.02,0.05)40-49 0.1 (0.05,0.15)50-59 0.32 (0.17, 0.50)60-69 1.0 (0.55,1.53)70-79 2.3 (1.2,3.4)80+ 3.7 (2.0,5.7)Table 1: Posterior median and credible intervals of Infection Fatality Ratio for various groups. Webelieve the credible intervals to be optimistically narrow.Posterior predictive distribution plots are shown in Figure 2. We note the problems with young Chinesedetected cases, although even the most extreme mismatch only corresponds to a factor of 2 IFR change, ifreflecting incorrect numbers of actual cases. In older groups the model cases are a little high on average,but not by enough to suggest much change in IFR. These mismatches might be reduced by better modelsfor the ascertainment proportion by age. Figure 1 shows the posterior predictive distribution for totalDiamond Princess deaths with the actual deaths as a thick red bar.The median and credible intervals for the IFR as percentages in various groups are in Table 1. Theyshow different estimates of this crucial quantity compared to Verity et al. (2020), again emphasising theurgent need for statistically principled sampling data to directly measure prevalence, instead of having torely on complex models of problematic data with strong built in assumptions. Acknowledgements: we thank Jonathan Rougier and Guy Nason for helpful discussion of onset-to-death interval estimation and the individual level data.
References
Verity, R., L. C. Okell, I. Dorigatti, P. Winskill, C. Whittaker, N. Imai, G. Cuomo-Dannenburg,H. Thompson, P. G. Walker, H. Fu, et al. (2020). Estimates of the severity of coronavirus disease2019: a model-based analysis.
The Lancet Infectious Diseases .5 −9 DP cases D en s i t y . . DP cases D en s i t y . . . . DP cases D en s i t y
10 20 30 40 50 . . . DP cases D en s i t y
10 30 50 . . DP cases D en s i t y
10 20 30 40 50 60 . . . DP cases D en s i t y
30 50 70 . . DP cases D en s i t y
140 180 220 . . . DP cases D en s i t y
200 250 300 . . DP cases D en s i t y
30 50 70 . . . DP deaths D en s i t y −1.0 −0.6 −0.2 . . . DP deaths D en s i t y DP deaths D en s i t y DP deaths D en s i t y DP deaths D en s i t y DP deaths D en s i t y . . . . DP deaths D en s i t y . . . DP deaths D en s i t y . . DP deaths D en s i t y . . . China cases D en s i t y
700 800 900 . . China cases D en s i t y
800 1200 1600 . . China cases D en s i t y . . China cases D en s i t y + − China cases D en s i t y . . China cases D en s i t y . . . China cases D en s i t y . . China cases D en s i t y . . China cases D en s i t y . . China deaths D en s i t y . . China deaths D en s i t y . . . China deaths D en s i t y . . China deaths D en s i t y . . . China deaths D en s i t y
20 40 60 80 . . China deaths D en s i t y
80 120 160 . . . China deaths D en s i t y
250 300 350 400 . . China deaths D en s i t y
250 300 350 400 . . China deaths D en s i t y
150 200 250 300 . . . Figure 2: The posterior predictive distributions for the cases (left 3 by 3 block) and deaths (right 3 by3 block) for the DP (top 3 rows) and China (bottom 3 rows) as histograms, with the observed values asvertical red lines. Clearly there are problems with the younger Chinese cases. c[i] <- gamma[1] + (1-gamma[1])*exp(-(age[i]-70)ˆ2/exp(gamma[2]))}pc[9] <- 1for (i in 1:9) { oad.module("glm") or (i in 1:9) {ifr <- um$ps[i,,1]*um$pd[i,,1]*um$pc[i,,1]ifr <- ifr[-(1:2000)]x <- hist(log10(ifr),main=(i-1)*10,xlab="log10(risk)")ci[,i] <- quantile(ifr,c(.025,.5,.975))mode.p[i] <- 10ˆx$mid[x$counts==max(x$counts)]mean.p[i] <- mean(ifr)}ci*100mode.p*100or (i in 1:9) {ifr <- um$ps[i,,1]*um$pd[i,,1]*um$pc[i,,1]ifr <- ifr[-(1:2000)]x <- hist(log10(ifr),main=(i-1)*10,xlab="log10(risk)")ci[,i] <- quantile(ifr,c(.025,.5,.975))mode.p[i] <- 10ˆx$mid[x$counts==max(x$counts)]mean.p[i] <- mean(ifr)}ci*100mode.p*100