[PDF] Low incidence rate of COVID-19 undermines confidence in estimation of the vaccine efficacy

Abstract

Knowing the true effect size of clinical interventions in randomised clinical trials is key to informing the public health policies. Vaccine efficacy is defined in terms of the relative risk or the ratio of two disease risks. However, only approximate methods are available for estimating the variance of the relative risk. In this article, we show using a probabilistic model that uncertainty in the efficacy rate could be underestimated when the disease risk is low. Factoring in the baseline rate of the disease, we estimate broader confidence intervals for the efficacy rates of the vaccines recently developed for COVID-19. We propose new confidence intervals for the relative risk. We further show that sample sizes required for phase 3 efficacy trials are routinely underestimated and propose a new method for sample size calculation where the efficacy is of interest. We also discuss the deleterious effects of classification bias which is particularly relevant at low disease prevalence.

Full PDF

aa r X i v : . [ s t a t . M E ] F e b Low incidence rate of COVID-19 undermines confidence inestimation of the vaccine efficacy

Yasin Memari ∗ MRC Cancer Unit, University of Cambridge, Cambridge CB2 0XZ, UKFebruary 3, 2021

Abstract

Knowing the true effect size of clinical interventions in randomised clinical trials is key toinforming the public health policies. Vaccine efficacy is defined in terms of the relative risk or theratio of two disease risks. However, only approximate methods are available for estimating thevariance of the relative risk. In this article, we show using a probabilistic model that uncertaintyin the efficacy rate could be underestimated when the disease risk is low. Factoring in the baselinerate of the disease, we estimate broader confidence intervals for the efficacy rates of the vaccinesrecently developed for COVID-19. We propose new confidence intervals for the relative risk. Wefurther show that sample sizes required for phase 3 efficacy trials are routinely underestimated andpropose a new method for sample size calculation where the efficacy is of interest. We also discussthe deleterious effects of classification bias which is particularly relevant at low disease prevalence.

Introduction

Vaccines are seen as the best control measure for the coronavirus pandemic. In this context,understanding the true efficacy of the vaccines and clinical interventions is crucial. Randomisedclinical trials are conducted to systematically study the safety and the efficacy of an interventionin a subset of the population before it is widely used in the general population. Inplacebo-controlled vaccine trials, participants are randomised into vaccinated and unvaccinatedgroups where cases of the disease or infection are allowed to accrue over time. In planning aclinical trial, advance sample size calculation determines the size of the trial population needed todetect a minimal clinically relevant difference between the two groups if such a difference exists.The indicator for effectiveness of a vaccine is usually reduction of the cases in the vaccinatedgroup relative to the control group. However, it is sometimes naively assumed that the trialparticipants who do not experience the event provide no information. Consequently, the eventrate or the incidence rate of the disease receives inadequate attention. For rare diseases, it is often ∗ [email protected] Methods

Vaccine efficacy is defined as the proportionate reduction in the risk of disease or infection in avaccinated group compared to an unvaccinated group. It is defined as (1-RR) × risk ratio , RR = π v /π c , where π are the incidence of the disease amongthose exposed in the vaccinated and control groups. Throughout this paper we interchangeablyuse the terms, incidence rate, disease risk, prevalence and event rate.It is important to remember that the variables π v and π c are scaled binomials as theyrepresent sample proportions. Assuming equal person-time exposure in the two groups, theefficacy is often summarised in terms of the numbers of cases in the vaccinated and unvaccinatedgroups, t v and t c respectively: α = 1 − π v π c ≃ − t v t c . (1)It appears in the literature that only approximate methods are available for the variance of theratio of two binomial parameters [2, 3]. The consensus method that is commonly used to assignconfidence intervals to the risk ratio is credited to Katz et al [3]. The method is based onasymptotic normality of logarithm of the ratio of two binomial variables. Assuming independenceof the incidence rates, it follows that var(log( π v /π c )) = var(log( π v )) + var(log( π c )). Using aTaylor series, the variances are approximated as var(log( π )) ≈ var( π ) /π where Wald method isoften used to set var( π ). Then two-sided 95% confidence intervals on the efficacy (e.g. see [4–7])can be written as 95%CL : 1 − exp (cid:18) ln(RR) ± . r − π v t v + 1 − π c t c (cid:19) . (2)Hereafter we refer to equation 2 as pooled Wald approximation. We will show that the methodunderestimates the variance espcially when the incidence rate is low.Equation 2 sets out the large sample asymptotic variance of the risk ratio. However, Waldmethod used to define var( π ) is known to be unreliable when π is small. One may use alternativebinomial proportion confidence intervals, however, log normality of the ratio might not hold andthe variance of (the logarithm of) the ratio may be irreducible. Hightower et al [5] raised question2/16bout the credibility of the confidence limits when the efficacy is high and the disease risk is low.Also, O’Neill [8] noted that, when t ≪ n , the variance of ln(RR) in equation 2 remains fairlystable and quickly converges to 1 /t v + 1 /t c .Ratio distributions are known to have heavy tails and often no finite variance. If one were tomodel the likelihood function for the efficacy defined in equation 1 in terms of independentincidence rates, the choice of the prior probabilities for π v and π c would be critical. One canreadily verify that the variance of the ratio of two binomial distributions increases as binomialprobabilities decrease. Uninformative priors could simply cancel out by the division and thedependence of the posterior on the prevalence would not become obvious. Analytical solutionsusing independent incidence rates may also be hard to obtain.For an analytical solution, we model the efficacy in terms of conditional probabilities of thedisease risks. Independence of the probabilities of the incidence rates is neither necessary nor idealwhen calculating the efficacy, as equation 1 imposes a constraint on the two variables. Under abinomial model with overall prevalence of π = t/n in both groups and total population size of n ,overall number of cases t = t c + t v follows t ∼ Bin( n, π ), then, from equation 1 assuming t c ∼ Bin( t, / (2 − α )), we expect t c ∼ Bin( n, π/ (2 − α )). Were we to use Poisson distributions for t and t c , t c conditional on t would still follow a binomial distribution. Modeling the efficacy interms of conditional probabilities has previously been suggested [4]. This notation enables toexplicitly parametrise the likelihood function in terms of the prevalence, irrespective of the priorsfor π v and π c .For a general solution accounting for classification bias we assume an imperfect diagnosticprocedure with sensitivity Se and specificity Sp. Then fraction of individuals who test positive forthe disease is sum of true positive rate and false positive rate: T = Se × π + (1 − Sp) × (1 − π )= c + c π, (3)where c =1-Sp is the false positive rate and c =Se+Sp-1. The posterior distribution of α giventhat t c is binomial follows as p ( α | t c , π, c , c ) = p ( t c | α, π, c , c ) p ( π ) p ( α ) g ( α ) ∝ g ( α ) (cid:18) nt c (cid:19) (cid:18) c + c π − α (cid:19) t c (cid:18) − c + c π − α (cid:19) n − t c f ( π ) , (4)where f ( π ) is the prior on π and we have assumed uniform prior on the efficacy α ∼ unif { , } .For a complete solution, the marginal likelihood g ( α ) can be written in terms of the incompletebeta function (see e.g. [9]): g ( α ) = f ( π ) (cid:18) nt c (cid:19) ( c + c π ) (cid:8) B ( c + c π ; t c − , n − t c + 1) − B (( c + c π ) / t c − , n − t c + 1) (cid:9) . As we do not intend to impose a prior on the prevalence, f ( π ) in equation 4 cancels out and ouranalysis, in essence, is likelihood based. One needs to remember that, the posterior in equation 4,as it was derived from the second equality in equation 1, is valid only when the individuals areequally divided between the two groups. 3/16he mode of the posterior of α is obtained by setting the derivative of the log likelihood tozero i.e. ∂ℓ/∂α = ∂ ln( p ( α | t c , π )) /∂α = 0. This leads to α mode = 2 − n ( c + c π ) t c , (5)which corresponds to the maximum likelihood estimator (MLE). Cram´er–Rao bound expresses alower bound on the variance of any unbiased estimator of α in terms of the inverse of the Fisherinformation Var( α mode ) ≥ I ( α ) , (6)where the Fisher information I ( α ) is obtained as I ( α ) = E h(cid:16) ∂∂α ℓ ( α | t c , π ) (cid:17) i = n × E h(cid:16) − − t c − α − π − − − α (cid:17) i = n ( c + c π )(2 − α ) (2 − α − ( c + c π )) . (7)Here E denotes ‘expected’ over t c , where we have substituted E [ t c ] = E [ t c ] = ( c + c π ) / (2 − α ).We will show that the conditional binomial model has a more subtle dependence on π comparedto the pooled Wald method.Under certain regularity conditions and assuming asymptotic normality near MLE, 95%confidence intervals on α mode can be estimated as α mode ± . p I ( α mode ) . (8)However, as the posterior distribution is asymmetric, especially when the efficacy is high, and theintervals could lie outside [0,1], we will estimate the credible intervals computationally. Results

Effect of incidence rate on vaccine efficacy

The posterior probability of vaccine efficacy given in its simplest form in equation 4 is ready forinspection. Using binomial notation is particularly useful in enabling us to directly plug in thenumbers n , t c in the estimation of α . In this section we evaluate the impact of the incidence rateon the efficacy and assign new confidence bounds to the efficacy of COVID-19 vaccines.Firstly, we assume a diagnostic test with perfect sensitivity and specificity i.e. Se=Sp=1. Inthe absence of misclassification, mode of the posterior in equation 5 corresponds to theexpectation ˆ α = 1 − t v /t c . The larger n the smaller the variance of the posterior, however, for afixed n , the variance depends on π . Figure 1 shows the posterior probability of α plotted over arange of π , for a fixed n on the left hand, and for a fixed t on the right hand, assuming truevaccine efficacy of 70% and 90% respectively. Also plotted in vertical lines are the independent95% confidence intervals from equation 2. As the event rate falls, the posterior distributions andthe confidence intervals become wider, however, for a fixed t (right plot) Wald intervals are stableover a wide range of π , and more so when the efficacy is high. The proposed conditional binomialmodel better represents the variability at low prevalence. 4/16 (cid:0)(cid:1)(cid:2)(cid:3)(cid:4)(cid:5)(cid:6)(cid:7)(cid:8)(cid:9)(cid:10)(cid:11)(cid:12)(cid:13)(cid:14) (cid:15)(cid:16)(cid:17)(cid:18) (cid:19)(cid:20)(cid:21)(cid:22) (cid:23)(cid:24)(cid:25)(cid:26) (cid:27)(cid:28)(cid:29)(cid:30) 1(cid:31) ! vaccine efficacy rate po s t e r i o r p r obab ili t y incidence rate " vaccine efficacy rate po s t e r i o r p r obab ili t y incidence rate ABCDEFGHIJKLMNOPQRSTU

Figure 1. Posterior distribution of vaccine efficacy . Blue lines represent the normalisedposterior probabilities, while vertical lines show the independent pooled Wald confidence intervals.Left hand plot assumes a fixed n =50,000 while right hand plot is for a fixed t =2,000. The generaltrend holds for different values of the parameters. Wald method overstates the confidence in theefficacy when t ≪ n .Three clinical trials of the vaccines designed to prevent COVID-19 recently published theirinterim phase 3 analysis results [10–12] with two of them reporting incredibly narrow 95%confidence bounds on the efficacy. The reported case numbers and the efficacy rates for theprimary end points are provided in Table 1. Firstly, we note that, although the trials useddifferent models and priors on the efficacy, the reported confidence intervals almost perfectlycorrespond with those obtained from equation 2. At large n the posterior is clearly dominated bythe data and the Bayesian and the frequentist are equivalent. Furthermore, especially where theefficacy is high, pooled Wald confidence intervals hardly vary by the choice of n . If one were to usedifferent values for n in Table 1, over a large range of the values equation 2 would still give thesame confidence intervals. Therefore, the uncertainty caused by n v and n c is not accounted for.We re-estimate the confidence intervals using the conditional binomial model presented in theMethods. Using the case numbers reported, the likelihood of the data in equation 4 is obtained bysetting the prevalence to π = T = t/n . Then maximum a posteriori (MAP) and 95% credible Table 1.

Estimated efficacy of COVID-19 vaccine trialsTrial Case numbers and reported efficacy rates Estimated efficacy rate case rate invaccinated case rate incontrol reportedefficacy and 95% CI estimated modeand 95% credible interval

AZ-Oxford (combined) 30/5,807 101/5,829 70 ·

4% [54 ·

8, 80 ·

6] 70.3% [39.1, 90.9]Pfizer-BioNTech 8/18,198 162/18,325 95.0% [90.3, 97.6] 95.1% [74.9, 99.6]Moderna-NIH 11/14,134 185/14,073 94.1% [89.3, 96.8] 94.1% [75.4, 99.5]5/16 igure 2. Estimated efficacy of COVID-19 vaccines . Posterior probabilities for theconditional binomial model are plotted in red, with shaded areas representing the 95% credibleintervals. Blue curves are for when n is set to t v + t c and correspond with pooled Waldapproximation.intervals for the efficacy rates are calculated computationally. The results shown in Table 1 arecontrasted with those reported. Although estimated modes are the same, our credible intervalsare wider. Incorporating the incidence rates has removed the overwhelming confidence originallyassigned to the point estimates. Note that, our approach requires the trial participants to beequally divided between the vaccinated and unvaccinated groups which is roughly the case here.Figure 2, in red, shows the posterior probabilities and the credible intervals for COVID-19vaccines. Of note is that, if we were to hypothetically assume π = t/n = 1, the posterior inequation 4 would produce the same intervals as those reported by the vaccine trials and Waldapproximation. Moreover, an independent binomial model with uninformative (e.g. uniform)priors for π v and π c would produce the pooled Wald intervals. Bias in case classification

So far we have assumed no bias in classification of the cases, however, imperfect diagnosticprocedure could lead to misclassification of the infected and uninfected individuals. In this sectionwe examine the effect of classification bias on estimation of the efficacy.It is worth noting that equation 3 requires the observed infection rate T to be greater than thefalse positive rate c = 1 − Sp . This relates to the ‘false positive paradox’ which implies that theaccuracy of a diagnostic test is compromised if the test is used in a population where theincidence of the disease is lower than the false positive rate of the test itself. Furthermore, falsenegatives could dominate at low incidence rates. When the disease risk is low, as the majority ofthe tests are negative, a small false negative rate could lead to a situation where false negativesoutnumber the positive cases. These concepts are further explained in Note 1.Figure 3 illustrates the effect of classification bias on the posterior probability of the vaccineefficacy. The left plot shows the impact of a very small reduction in specificity to 0.999 (orincrease in false positive rate), while the right hand plot shows the effect of reduction insensitivity to 0.95 (or increase in false negative rate). A small loss of specificity could lead toserious underestimation of the effect size as noted by [6, 7], but it could further lead to completeloss of precision when the incidence rate is low. Loss of sensitivity results in overestimation of the6/16 WXYZ[\]^_‘abcde fghi jklm nopq rstu vwxy vaccine efficacy rate po s t e r i o r p r obab ili t y incidence rate z{|}~(cid:127)(cid:128)(cid:129)(cid:130)(cid:131)(cid:132)(cid:133)(cid:134)(cid:135)(cid:136)(cid:137)(cid:138)(cid:139)(cid:140)(cid:141) (cid:142)(cid:143)(cid:144)(cid:145) (cid:146)(cid:147)(cid:148)(cid:149) (cid:150)(cid:151)(cid:152)(cid:153) (cid:154)(cid:155)(cid:156)(cid:157) (cid:158)(cid:159)(cid:160)¡ vaccine efficacy rate po s t e r i o r p r obab ili t y incidence rate Figure 3. Effect of imperfect diagnostic procedure . Misclassification error biases the vaccineefficacy rate. Left plot shows the distributions for Se=1 and Sp=0.999, while the right plot is forSe=0.95 and Sp=1, with n =50,000 in both. True efficacy rate is assumed at 70%. Imperfectspecificity, however small, could have disastrous effects when incidence rate is low, whereas lack ofsensitivity consistently inflates the efficacy rate.efficacy irrespective of the disease rate. In these plots, we have considered a larger reduction insensitivity, not only because reduction in specificity has a more dramatic effect, but also asdiagnostic assays typically have relatively higher specificity than sensitivity, not least due tospecimen collection, insufficient viral load, stage of the disease, etc. [13] However, the effect of lossof sensitivity is consistently toward shifting the mode in equation 5, or MAP, to higher values of α , even at low incidence rates where negative predictive value is high. Discussion

Base rate fallacy happens in situations where base rate information is ignored in favour ofindividuating information. In probability terms, it often occurs when P ( A | B ) is confused orinterchangeably used with P ( B | A ) ignoring the prior probability P ( A ), e.g. probability of havinga rare disease given a positive test is wrongly equated to probability of a positive test given thedisease (or diagnostic sensitivity) ignoring the low prior probability of the disease itself. Weshowed, in estimation of the vaccine efficacy when the disease rate is low, not only diagnosticerror could have deleterious effects, but also failure to appropriately integrate the informationabout the base rate or incidence rate of the disease in the calculation could lead tounderestimation of the uncertainty.Vaccine efficacy is defined in terms of the risk ratio π v /π c , that is the ratio of two binomialproportions. Ratio distributions are known to have undefined variances, conversely, pooled Waldmethod has been traditionally used to approximate the variance of the risk ratio. In this article,we used a parametrisation that makes the dependence of the efficacy on the disease prevalenceexplicit, without recourse to priors for π v and π c . Particularly, improper priors for π v and π c ote 1 J. Balayla [14] noted that there existsa prevalence threshold below which thepositive predictive value (PPV) of adiagnostic test drops precipitously relativeto the prevalence. This means that attoo low a prevalence a positive test resultcould more likely be a false positive thana true positive. More underappreciated isthe impact of the negative predictive value(NPV). Though, at low incidence rates,the negative predictive value is nearly100%, a small loss in sensitivity couldstill have a marked effect as the negativetests vastly outnumber the positive tests.We could even have a situation where thefalse negatives are more than the true andfalse positives. To avoid these pitfalls,the participants are pre-selected for theirsymptoms before confirmation with theassay. Though this raises the pre-testprobability, it could cause collider bias [15]. ¢£⁄¥ƒ§¤'“«‹›ﬁﬂ(cid:176)–†‡·(cid:181) ¶•‚„ ”»…‰ (cid:190)¿(cid:192)` ´ˆ˜¯ ˘˙¨(cid:201) prevalence po s i t i v e and nega t i v e p r ed i c t i v e v a l ue Figure 4.

Positive (red) and negative(blue) predictive values are plotted in termsof population prevalence. Solid lines arefor to a diagnostic test with Se=Sp=0.99;dashed lines are for Se=Sp=0.95. Verticallines show the prevalence thresholds.could lead to underestimation of the variance. We conditioned t c on t = t c + t v and treated t asanother random variable. The resulting compound probability t c ∼ Bin( n, π/ (2 − α )) isover-dispersed and better captures the variability of the variance with π , whereas pooled Waldconfidence intervals are largely insensitive to π when π is small.Wald method is intended as large sample approximation, however, the bulk of the life sciencesdeals with small sample sizes. Therefore, it is likely that the confidence intervals reported in theliterature for the risk ratio (and odds ratio) are overly optimistic. By analogy of equations 6 and8, one could define new confidence intervals for the risk ratio by substituting RR=( n c /n v )(1 − α )for unequal sized groups in the Fisher information. The results can be written as95%CL : RR ± . n c n v (cid:0) t v t c (cid:1)s t v /t c − πt v + t c , (9)where π =( t v + t c ) / ( n v + n c ). The above intervals on the risk ratio are generally wider than butconverge to the pooled Wald method when the sample size is large. They may be preferred tothose obtained from equation 2 when the sample size is small or the relative risk is low.Particularly, for a fixed sample size as RR nears zero, the upper bound in equation 9 remainsconservative and the lower bound takes negative values and becomes undetermined. On thecontrary, as RR nears zero, the pooled Wald intervals remain positive and shrink rapidly, givingthe counterintuitive impression of increased precision when the incidence rate is low (similar tofigure 2). However, as with Wald method, the confidence intervals in equation 9 were derived8/16sing normal approximation which may not hold when RR significantly deviates from 1.Our findings have implications for pre-planning the sample sizes for phase 3 efficacy trials.Sample size calculation in case-control design is often stated as “How many samples are needed tobe randomised in order to conclude with 100(1 − β )% power that a treatment difference of size ∆exists between the two groups at the level of significance of α ?”. Therefore calculation of samplesize requires specification of the null hypothesis (expected treatment effect) and the alternativehypothesis defined in terms of the difference in treatment outcomes. Here, α or type I error is theprobability of rejecting the null hypothesis where we should not, and β or type II error is theprobability of failing to reject the null hypothesis where we should reject it. Under the assumptionof normality of the treatment outcome, a generic formula for per-group sample size is derived interms of the two-sample t-test: [1] n = 2 σ ∆ ( z − α/ + z − β ) , (10)where z -scores determine the critical values for the standard normal distribution. Therefore oneneeds to specify the variance of the measured variable, the desired rates of error and themagnitude of the treatment difference. Where the measured variable is binary (infected oruninfected), the test statistic reduces to the test for the difference between two proportions.Where the efficacy is of interest, the log normal approximation of the risk ratio from equation 2may be used to define the test statistic. O’Neill [8] calculated the required sample sizes for atwo-sided test given the pooled Wald variance in equation 2. We re-write the total sample size inthis form: n = 2 ( z − α/ + z − β ) d (cid:16) (2 − VE) π (1 − VE) − (cid:17) , (11)where d = ln (cid:16) ∆ / (2(1 − VE)) + q(cid:0) ∆ / (2(1 − VE)) (cid:1) + 1 (cid:17) . Here VE is the anticipated efficacy and ∆ is the expected difference in VE in absolute terms. Weshowed, however, that at low prevalence rate, equation 2 significantly underestimates the variance.Using an inadequately small variance could lead to underestimation of the type I and type IIerrors, potentially resulting in winner’s curse in underpowered studies [16, 17]. If instead we wereto use the proposed compound binomial model, one could simply substitute the variance inequation 6. As in [8], under the assumption of normality and assuming ∆ is the differencebetween the upper and lower limits of the confidence interval, substituting the margin of error as∆ / zσ in equation 6 gives n ≥ z − α/ + z − β ) π ∆ (2 − VE) (2 − VE − π ) . (12)This equation sets out the total required sample size for a perfect diagnostic test, to be equallydivided between the two groups.The proposed Cram´er–Rao bound based formula 12 assumes normality of distributions of thenull and the alternative hypotheses, however, the binomial likelihood function is asymmetric, as ispooled Wald intervals (see [8]), and becomes more so as the efficacy increases. Notwithstandingthe limitations, we plug in the critical values for α = 0 .

05 and power of 100(1 − β ) = 80 per cent( z − α/ = 1 .

96 and z − β = 0 .

84) in equations 11 and 12. The resulting sample sizes are plotted inFigure 5 for ∆ = 10% and different prevalence and efficacy rates. 9/16 ¸(cid:204)˝˛ˇ—(cid:209) −6 −5 −4 −3 −2 −1 incidence rate s a m p l e s i z e Efficacy (cid:210)(cid:211)(cid:212)(cid:213)(cid:214)(cid:215)(cid:216)(cid:217) incidence rate s a m p l e s i z e Efficacy

Figure 5. Sample size relative to disease prevalence . Total number of samples requiredto detect with 80% power and level of significance of α = 0 .

05 a difference in the efficacy ofsize ∆ = 10%. Solid lines represent Cram´er–Rao bound and dashed lines represent pooled Waldapproximation. On the left, x-axis is on logarithmic scale. y-axis is logarithmic in both plots.In Figure 5 the relationship between the sample size and the incidence rate looks linear onlog-log scale as they have a power law relationship. However, while the two methods coincide athigh incidence rates, pooled Wald method significantly underestimates the sample sizes at lowincidence rates especially when the efficacy is high (note that y-axis is on logarithmic scale).Contrasting Figure 5 with the case rates in Table 1, it is clear that, to achieve the narrowconfidence bounds that Pfizer and Moderna have reported, they would have needed several timesmore samples under pooled Wald method, and an order of magnitude more under Cram´er–Raobound. If the event rate were to differ from that in the general population or if possibility ofmisclassification was non negligible, such a discrepancy in incidence rates could cause such largevariations in the variance that the trial population could be unrepresentative of the largerpopulation. Table 2 provides the total sample sizes from Cram´er–Rao bound formula 12 fordifferent levels of efficacy and effect size. It is clear that the sample size is also very sensitive tothe choice of ∆, therefore an investigator must be wary of misspecification of the anticipatedtreatment difference [1].Throughout the Methods, we incorporated the misclassification error in the calculations inorder to emphasise the importance of accounting for classification bias when the disease is rare.We showed that, while lack of diagnostic sensitivity consistently inflates the estimated efficacyrates, imperfect specificity results is serious loss of accuracy and precision at low disease risks.Case definition for COVID-19 is particularly a major caveat. The three vaccine trials broadlyfollow FDA definition of the disease. For primary end points symptomatic cases are identified bysurveillance or are self-reported, and are subsequently confirmed with RT-PCR. Pre-selecting ofthe participants for PCR assay could create the possibility for collider bias [15]. Moreover, thehighly non-specific symptoms of COVID-19, which include symptoms as common as cough andcongestion, could create the perfect conditions for misclassification. False negatives due to e.g.10/16 able 2.

Total sample sizes needed to conclude with 80% power and α =0.05 a significant effectsizeeffect size event rateVE ∆ 0.5 0.1 0.05 0.01 0.005 0.001 0.00050% 10% 37,632 238,336 489,216 2,496,256 5,005,056 25,075,456 50,163,4560% 20% 9,408 59,584 122,304 624,064 1,251,264 6,268,864 12,540,8640% 30% 4,181 26,482 54,357 277,362 556,117 2,786,162 5,573,7170% 40% 2,352 14,896 30,576 156,016 312,816 1,567,216 3,135,21630% 10% 21,751 145,009 299,080 1,531,654 3,072,371 15,398,105 30,805,27330% 20% 5,438 36,252 74,770 382,913 768,093 3,849,526 7,701,31830% 30% 2,417 16,112 33,231 170,184 341,375 1,710,901 3,422,80830% 40% 1,359 9,063 18,693 95,728 192,023 962,382 1,925,33060% 10% 11,064 79,905 165,957 854,372 1,714,890 8,599,037 17,204,22160% 20% 2,766 19,976 41,489 213,593 428,723 2,149,759 4,301,05560% 30% 1,229 8,878 18,440 94,930 190,543 955,449 1,911,58060% 40% 691 4,994 10,372 53,398 107,181 537,440 1,075,26490% 10% 4,553 37,946 79,686 413,607 831,009 4,170,221 8,344,23790% 20% 1,138 9,486 19,921 103,402 207,752 1,042,555 2,086,05990% 30% 506 4,216 8,854 45,956 92,334 463,358 927,13790% 40% 285 2,372 4,980 25,850 51,938 260,639 521,515selective reporting, specimen collection, etc, and PCR false positives due to e.g. remnant viralRNA, etc could be introduced if the test is not repeated [13, 18]. Much remains unknown aboutCOVID-19 and its many symptoms and presentations. Therefore, it is recommended to accountfor classification bias in the calculation. The code for calculating the posterior probability of thevaccine efficacy, which can simultaneously marginalise over the diagnostic sensitivity andspecificity is provided. Code

R code for the posterior probability of the efficacy was modified from code published in [9]. It isprovided in Appendix along with functions to calculate the sample sizes from equations 11 and 12.

Acknowledgments

The author’s position at the University of Cambridge is funded by CRUK grant C60100/A23916.The author would like to appreciate the helpful comments received from the Cancer Mutagenesisgroup at MRC Cancer Unit. 11/16 eferences

1. J. Wittes. Sample Size Calculations for Randomized Controlled Trials.

EpidemiologicReviews , 24(1):39–53, 07 2002.2. J. J. Gart and J. Nam. Approximate interval estimation of the ratio of binomialparameters: a review and corrections for skewness.

Biometrics , 44(2):323–338, Jun 1988.3. D. Katz et al. Obtaining confidence intervals for the risk ratio in cohort studies.

Biometrics , 34:469–474, 1978.4. M. Ewell. Comparing methods for calculating confidence intervals for vaccine efficacy.

StatMed , 15(21-22):2379–2392, 1996.5. A. W. Hightower et al. Recommendations for the use of Taylor series confidence intervalsfor estimates of vaccine efficacy.

Bull World Health Organ , 66(1):99–105, 1988.6. P. A. Lachenbruch. Sensitivity, specificity, and vaccine efficacy.

Controlled Clinical Trials ,19(6):569 – 574, 1998.7. A. Hahn et al. Impact of diagnostic methods on efficacy estimation – a proof-of-principlebased on historical examples.

Tropical Medicine & International Health , 25(3):357–363,2020.8. R. T. O’Neill. On sample sizes to estimate the protective efficacy of a vaccine.

Stat Med ,7(12):1279–1288, Dec 1988.9. P. J. Diggle. Estimating prevalence using an imperfect test.

Epidemiology ResearchInternational , 2011(608719), 2011.10. M. Voysey et al. Safety and efficacy of the ChAdOx1 nCoV-19 vaccine (AZD1222) againstSARS-CoV-2: an interim analysis of four randomised controlled trials in Brazil, SouthAfrica, and the UK.

Lancet , Dec 2020.11. F. P. Polack et al. Safety and efficacy of the bnt162b2 mrna covid-19 vaccine.

New EnglandJournal of Medicine , 383(27):2603–2615, 2020. PMID: 33301246.12. L. R. Baden et al. Efficacy and safety of the mrna-1273 sars-cov-2 vaccine.

New EnglandJournal of Medicine , 0(0):null, 0.13. I. Arevalo-Rodriguez et al. False-negative results of initial RT-PCR assays for COVID-19:A systematic review.

PLoS One , 15(12):e0242958, 2020.14. J. Balayla. Prevalence threshold and the geometry of screening curves.

PLoS One ,15(10):e0240215, 2020.15. G. J. Griffith et al. Collider bias undermines our understanding of COVID-19 disease riskand severity.

Nat Commun , 11(1):5749, 11 2020.16. J. P. Ioannidis. Why most discovered true associations are inflated.

Epidemiology ,19(5):640–648, Sep 2008. 12/167. K. S. Button et al. Power failure: why small sample size undermines the reliability ofneuroscience.

Nat Rev Neurosci , 14(5):365–376, 05 2013.18. J. Balayla. Bayesian updating and sequential testing: Overcoming inferential limitations ofscreening tests, 2020.

Appendix \ < − f u n c t i o n ( . ) mpfr ( . , p r e c B i t s = 2 0 0 )e f f i c a c y . bayes < − f u n c t i o n ( alpha , Tc , n , pi , l o w s e = 0 .5 , h i g h s e = 1 .0 ,s e a =1 , seb =1 , lo wsp = 0 .5 , h i g h s p = 1 .0 ,spa =1 , spb =1 , n g r i d =20) { − c e l l s i n ea ch dimensio n f o r q u a d r a t u r e < − f u n c t i o n ( x , a , b ) { pbeta ( x , a , b ) ∗ beta ( .N( a ) , .N( b ) ) } nalpha < − l e n g t h ( a lpha )b i n . width < − (a lpha [ na lpha ] − a lpha [ 1 ] ) / ( nalpha − < − a lpha [ 1 ] + b i n . width ∗ ( 0 : ( nalpha −

1) )i n t e g r a n d < − a r r a y ( 0 , c ( nalpha , n g r i d , n g r i d ) )h1 < − (h i g h s e − l o w s e ) / n g r i dh2 < − (hig hsp − lo wsp ) / n g r i df o r ( i i n 1 : n g r i d ) { se < − l o w s e+h1 ∗ ( i − < − (1/( h i g h s e − l o w s e ) ) ∗ dbeta ( ( se − l o w s e ) / ( h i g h s e − l o w s e ) , sea , seb )f o r ( j i n 1 : n g r i d ) { sp < − lo wsp+h2 ∗ ( j − < − (1/(hig hsp − lo wsp ) ) ∗ dbeta ( ( sp − lo wsp ) / ( hig hsp − lo wsp ) , spa , spb )i f ( n g r i d ==1) { s e=h i g h s e ; sp=h i g h s p ; pse =1; psp =1; h1 =1; h2=1 } c1 < − − spc2 < − s e+sp − < − (c1+c2 ∗ p i ) ∗ chooseMpfr ( n , Tc ) ∗ ( i b e t a ( c1+c2 ∗ pi , Tc − − Tc+1) − i b e t a( ( c1+c2 ∗ p i ) / 2 , Tc − − Tc+1) )p < − (c1+c2 ∗ p i ) /(2 − a lpha )d e n s i t y < − r ep ( 0 , na lpha )f o r ( k i n 1 : na lpha ) { d e n s i t y [ k] < − asNumeric ( dbinom ( Tc , n , . N( p [ k ] ) ) /g ) } i n t e g r a n d [ , i , j ] < − pse ∗ psp ∗ d e n s i t y }} po st < − r ep ( 0 , na lpha )f o r ( i i n 1 : na lpha ) { p o s t [ i ] < − h1 ∗ h2 ∗ sum ( i n t e g r a n d [ i , , ] ) } ord < − o r d e r ( po st , d e c r e a s i n g=Tc )mode < − a lpha [ ord [ 1 ] ]cumpost=cumsum( p o s t /sum ( p o s t ) ) 14/16 n t e r v a l=c ( a lpha [ which . min ( abs ( cumpost − − } < − < −

185 < − (185+11)/(14134+14073) < − < −

1h i g h s e < − < − < − < − < − < − < − < − s e q ( 0 , 1 , by = 0 .0 0 0 5 ) ∗ ( 1 : 4 0 0 )r e s u l t < − e f f i c a c y . ba yes ( alpha , Tc , N, pi , lo wse , h i g h s e ,sea , seb , lowsp , hig hsp , spa , spb , n g r i d )r e s u l t $ m o d er e s u l t $ i n t e r v a lp l o t ( r e s u l t $ a l p h a , r e s u l t $ p o s t /sum ( r e s u l t $ p o s t ) , type=” l ” , x l a b=”a lpha ” ,y l a b=”p ( a lpha ) ” ) − group sample s i z e s p r e s e n t e d i n O’ N e i l l , < − f u n c t i o n (ARU,RW) { VE=0.4y < − RW ∗ VE/(2 ∗ (1 − VE) )d < − l o g ( y+s q r t ( yˆ2+1) )r e t u r n ( ( 1 . 9 6 ) ˆ2/dˆ2 ∗ ((1+1/(1 − VE) ) /ARU −

2) ) } p r i n t ( t ( o u t e r ( c ( 0 . 0 1 , 0 . 0 0 5 , 0 . 0 0 1 , 0 . 0 0 0 5 ) , r e v ( s e q ( 0 . 1 , 1 , 0 . 1 ) ) ,s a m p l e s i z e 9 5 w a l d p a p e r ) ) , quo te = FALSE) < − f u n c t i o n (VE, Pi , d e l t a ) { RW=d e l t a /VEARU=Pi /(2 − VE)y < − RW ∗ VE/(2 ∗ (1 − VE) )d < − l o g ( y+s q r t ( yˆ2+1) )r e t u r n ( 2 ∗ ( 1 . 9 6 + 0 . 8 4 ) ˆ2/dˆ2 ∗ ((1+1/(1 − VE) ) /ARU −

2) ) } − Rao bound ( power 80% and a lpha = 0 .0 5 )s a m p l e s i z e c r a m e r < − f u n c t i o n (VE, Pi , d e l t a ) { r e t u r n ( 4 ∗ ( 1 . 9 6 + 0 . 8 4 ) ˆ2 ∗ (2 − VE) ˆ2 ∗ (2 − VE − Pi ) / Pi / d e l t a ˆ 2 ) }}