[PDF] Estimating the reciprocal of a binomial proportion

Abstract

As a classic parameter from the binomial distribution, the binomial proportion has been well studied in the literature owing to its wide range of applications. In contrast, the reciprocal of the binomial proportion, also known as the inverse proportion, is often overlooked, even though it also plays an important role in various fields including clinical studies and random sampling. The maximum likelihood estimator of the inverse proportion suffers from the zero-event problem, and to overcome it, alternative methods have been developed in the literature. Nevertheless, there is little work addressing the optimality of the existing estimators, as well as their practical performance comparison. Inspired by this, we propose to further advance the literature by developing an optimal estimator for the inverse proportion in a family of shrinkage estimators. We further derive the explicit and approximate formulas for the optimal shrinkage parameter under different settings. Simulation studies show that the performance of our new estimator performs better than, or as well as, the existing competitors in most practical settings. Finally, to illustrate the usefulness of our new method, we also revisit a recent meta-analysis on COVID-19 data for assessing the relative risks of physical distancing on the infection of coronavirus, in which six out of seven studies encounter the zero-event problem.

Full PDF

aa r X i v : . [ s t a t . M E ] S e p Estimating the reciprocal of a binomial proportion

Jiajin Wei , , Ping He , ∗ and Tiejun Tong , † Division of Science and Technology, BNU-HKBU United International College,Zhuhai, China Department of Mathematics, Hong Kong Baptist University, Hong Kong

September 3, 2020

Abstract

As a classic parameter from the binomial distribution, the binomial proportionhas been well studied in the literature owing to its wide range of applications.In contrast, the reciprocal of the binomial proportion, also known as the inverseproportion, is often overlooked, even though it also plays an important role invarious ﬁelds including clinical studies and random sampling. The maximum like-lihood estimator of the inverse proportion suﬀers from the zero-event problem,and to overcome it, alternative methods have been developed in the literature.Nevertheless, there is little work addressing the optimality of the existing esti-mators, as well as their practical performance comparison. Inspired by this, wepropose to further advance the literature by developing an optimal estimator forthe inverse proportion in a family of shrinkage estimators. We further derivethe explicit and approximate formulas for the optimal shrinkage parameter un-der diﬀerent settings. Simulation studies show that the performance of our newestimator performs better than, or as well as, the existing competitors in mostpractical settings. Finally, to illustrate the usefulness of our new method, wealso revisit a recent meta-analysis on COVID-19 data for assessing the relativerisks of physical distancing on the infection of coronavirus, in which six out ofseven studies encounter the zero-event problem.

Key words : Binomial proportion, Inverse proportion, Relative risk, Shrinkageestimator, Zero-event problem ∗ Co-corresponding author. E-mail: [email protected] † Co-corresponding author. E-mail: [email protected] Introduction

The binomial distribution is one of the most important distributions in statistics, whichhas been extensively studied in the literature with a wide range of applications. Thisclassical distribution has two parameters n and p , where n is the number of indepen-dent Bernoulli trials and p is the probability of success in each trial (Hogg, McKean,and Craig, 2005). The probability of success, p , is also referred to as the binomialproportion. For excellent reviews on its estimation and inference, one may refer to, forexample, Agresti and Coull (1998) and Brown, Cai, and DasGupta (2001).Apart from the parameter p , it is known that some of its functions, say p (1 − p )and ln( p ), also play important roles in statistics and have received much attention. Inthis article, we are interested in the reciprocal function θ = 1 p , (1)which is another important function of p yet is often overlooked in the literature.For convenience, we also refer to θ in formula (1) as the inverse proportion of thebinomial distribution. To demonstrate its usefulness, we will introduce some motivatingexamples in Section 2 that connect the inverse proportion with the relative risk (RR)in clinical studies, and with the Horvitz-Thompson estimator (Horvitz and Thompson,1952; Fattorini, 2006). Moreover, we will also introduce in Section 6 a relationship ofthe inverse proportion to the number needed to treat (NNT) in clinical studies andpresent some future directions (Altman, 1998; Laupacis, Sackett, and Roberts, 1988;Hutton, 2000).To start with, let X = P ni =1 X i , where X i are independent and identically dis-tributed random variables from a Bernoulli distribution with success probability p ∈ (0 , X follows a binomial distribution with parameters n ≥ p . Now if we want to estimate the inverse proportion θ , a simple method will be to2pply the maximum likelihood estimation (MLE) and it yieldsˆ θ MLE = nX . (2)This estimator is, however, not a valid estimator because it is not deﬁned when X = 0,i.e. when there is no successful event in n trials. We refer to this problem as thezero-event problem in the point estimation of θ . In fact, the same problem also existsin the interval estimation of p . Speciﬁcally by Hogg, McKean, and Craig (2005), the100(1 − α )% Wald interval is given asˆ p ± z α/ r ˆ p (1 − ˆ p ) n , (3)where ˆ p = X/n , and z α/ is the upper α/ X = 0, the lower and upper limits of the Wald interval are both zero; andconsequently, they will not be able to provide a (1 − α ) coverage probability for thetrue proportion.To overcome the zero-event problem, Hanley and Lippman-Hand (1983) proposedthe “Rule of Three” to approximate the upper limit of the 95% conﬁdence interval(CI) for p . Speciﬁcally, since the upper limit of the one-sided CI for p is 1 − . /n when X = 0, the authors suggested to approximate this upper limit by 3 /n , whichthen yields the simpliﬁed CI as (0 , /n ). For more discussion on the “Rule of Three”,one may refer to Tuyl, Gerlach, and Mengersen (2009) and the references therein.In particular, we note that the Wilson interval (Wilson, 1927) and the Agresti-Coullinterval (Agresti and Coull, 1998) for p have also been referred to as the variations ofthe “Rule of Three”.The Wilson interval was originated from Laplace who proposed the “Law of Suc-cession” in the 18th century. As mentioned in Good (1980), Laplace’s estimator for thebinomial proportion was given as ( X +1) / ( n +2), which is indeed a shrinkage estimatorfor p . Wilson (1927) generalized the shrinkage idea and proposed an updated “Law3f Succession” as ˜ p = ( X + c ) / ( n + 2 c ), where c > p forˆ p in the Wald interval (3) and yields the Agresti-Coull interval as˜ p ± z α/ r ˜ p (1 − ˜ p ) n . (4)It is known that the Agresti-Coull interval (4) always performs better than the Waldinterval (3), no matter whether n is large or small (Brown et al., 2001).By applying the Wilson estimator ˜ p , one may estimate the inverse proportion as˜ θ ( c ) = n + 2 cX + c , c > . (5)Note that the family of estimators in (5) do not suﬀer from the zero-event problem,and so ˜ θ ( c ) is a valid estimator for θ . On the other side, however, we ﬁnd that ˜ θ ( c )may not provide an optimal estimate for θ in some common settings. For illustration,we will show in Section 2.3 that there does not exist a ﬁnite value c > θ isan unbiased estimator of θ when θ = 2, or equivalently, when p = 0 . θ ( c ) = n + cX + c , c > . (6)It is interesting to point out that, as a special case, ˆ θ (1) = ( n + 1) / ( X + 1) has beenpreviously studied in the literature (Chao and Strawderman, 1972). More recently,Fattorini (2006) applied ˆ θ (1) to estimate θ in sampling designs and demonstrated thatit provided a good performance when n is large. Moreover, Seber (2013) showed thatˆ θ (1) is an asymptotically unbiased estimator of θ as n tends to inﬁnity. When n is small,however, ˆ θ (1) may not perform well due to the non-negligible bias in the estimator.In this paper, we propose to develop an optimal estimator for the inverse proportionin the family of shrinkage estimators (6) and also systematically study its statistical4roperties. In Section 2, we introduce two real situations where an estimate of theinverse proportion is needed, and meanwhile, we review the Haldane estimator and theFattorini estimator that have been well applied in practice. In Section 3, we revealthe eﬀect of the shrinkage parameter on the unbiasedness of the estimator and derivethe optimal shrinkage parameter c such that E [ˆ θ ( c )] = θ . In Section 4, we conductsimulation studies to evaluate the performance of our new estimator, and compare itwith existing competitors including the Fattorini estimator, the Haldane estimator anda piecewise estimator. In Section 5, we revisit a recent meta-analysis on COVID-19data by Chu et al. (2020) for assessing the relative risks of physical distancing on theinfection of coronavirus, and also apply our new estimator to overcome the zero-eventproblem on the relative risks. Lastly, we conclude the paper in Section 6 with somediscussion and future work, and postpone the technical results in the Appendix. In this section, we present two motivating examples in which an estimate of the inverseproportion θ is highly desired. The ﬁrst example is related to the relative risk, and thesecond example is related to the Horvitz-Thompson estimator. In clinical studies, the relative risk (RR), also known as the risk ratio, is a commonlyused eﬀect size for measuring the eﬀectiveness of a treatment or intervention. Speciﬁ-cally, RR is deﬁned as RR = p p , (7)where p is the event probability in the exposed group, and p is the event probabilityin the unexposed group.To estimate RR, we assume that there are n samples in the exposed group with5 being the number of events, and n samples in the unexposed group with X beingthe number of events. Let also X follow a binomial distribution with parameters n and p , X follow a binomial distribution with parameters n and p , and that theyare independent of each other. Then by (7) and applying the MLEs of p and p respectively, RR can be estimated by c RR = X /n X /n = X n X n . (8)A problem of this estimator is, however, that it will suﬀer from the zero-event problemwhen X = 0, which is the same problem as mentioned in Section 1.To avoid the zero-event problem in estimator (8), Haldane (1956) recommended toadd 0.5 to all the counts of events, and that yields the modiﬁed estimator of RR as f RR(0 .

5) = ( X + 0 . n + 1) / [( X + 0 . n + 1)]. Following this idea, the inverseproportion of the unexposed group is, in fact, estimated by˜ θ (0 .

5) = n + 1 X + 0 . , which is a special case of estimator (5) with c = 0 .

5. For ease of presentation, werefer to it as the Haldane estimator. Nevertheless, as will be shown in Section 4, theHaldane estimator may not provide a satisfactory performance when it is applied toestimate the inverse proportion. For additional evaluation on the Haldane estimator,one may refer to Carter et al. (2010) and the references therein.Besides, we note that an estimate of the inverse proportion is also needed, forexample, in the variance of ln( c RR) and in the variance of ln( d OR), where d OR = p (1 − p ) / [ p (1 − p )] is the estimated odds ratio. Speciﬁcally by Goodman (1964) and Gartand Zweifel (1967), these two variances can be approximated asVar(ln( c RR)) ≈ n p − n + 1 n p − n , (9)Var(ln( d OR)) ≈ n p + 1 n (1 − p ) + 1 n p + 1 n (1 − p ) , On random sampling without replacement from a ﬁnite population, it is known thatthe Horvitz-Thompson estimator has played an important role in the literature forestimating the population total (Horvitz and Thompson, 1952; Cochran, 2007).Let U be a population composed of t units { u , . . . , u t } , and p i be the ﬁrst-orderselection probability associated with unit u i . Let also Ω be a random variable associatedwith the population U , and Ω i be the value of Ω determined by unit u i . Followingthese notations, the population total of Ω can be deﬁned as T = P ti =1 Ω i . Then as anunbiased estimator of T , the Horvitz-Thompson estimator is given asˆ T = X j ∈ V ω j θ j = X j ∈ V ω j p j , (10)where ω j is the observed value of Ω j , and V ⊆ { , . . . , t } is a subset of samples selectedfor estimating the population total. In practice, the inverse proportions θ j = 1 /p j areoften unknown and need to be estimated.To estimate θ j in (10), Fattorini (2006) proposed a numerical method via MonteCarlo simulations. Speciﬁcally in each simulation, a total of n samples were selectedindependently with replacement from the population U , with X j being the numberof samples that contain the j th unit, where j ∈ V . Further to avoid the zero-eventproblem on X j , Fattorini applied estimator (6) with c = 1 to estimate the inverseproportions by ˆ θ j (1) = n + 1 X j + 1 , j ∈ V, (11)which then yields the modiﬁed Horvitz-Thompson estimator ˆ T m as ˆ T m = P j ∈ V ω j ˆ θ j (1).Unless otherwise speciﬁed, we will ignore the subscript j in (11) and refer to ˆ θ (1) as7he Fattorini estimator. Fattorini (2006) further showed that the modiﬁed Horvitz-Thompson estimator converges almost surely to ˆ T in (10) with known p j when n tendsto inﬁnity. For the Fattorini estimator in family (6) with c = 1, Seber (2013) showed that E [ˆ θ (1)] = E (cid:18) n + 1 X + 1 (cid:19) = 1 − (1 − p ) n +1 p = θ − θ (cid:18) − θ (cid:19) n +1 . (12)Then by the fact that lim n →∞ Bias[ˆ θ (1)] = lim n →∞ [ − θ (1 − /θ ) n +1 ] = 0 for any ﬁxed θ ∈ (1 , ∞ ), the Fattorini estimator is an asymptotically unbiased estimator of θ when n is large. In addition, when p is large enough, or equivalently when θ is close to 1,the estimation bias of the Fattorini estimator is often negligible no matter whether n is larger or small.In contrast, for the estimator ˜ θ (1) = ( n + 2) / ( X + 1) in family (5) with c = 1, by(12) we haveBias[˜ θ (1)] = (cid:18) n + 2 n + 1 (cid:19) E [ˆ θ (1)] − θ = θn + 1 − θ (cid:18) n + 2 n + 1 (cid:19) (cid:18) − θ (cid:19) n +1 . When θ is close to 1, the bias of ˜ θ (1) will converge to 1 / ( n + 1), which may not be neg-ligible when n is small. As a numerical comparison between the two estimators, we let p = 0 . n = 10, and it yields that Bias[ˆ θ (1)] ≈ − . × − and Bias[˜ θ (1)] ≈ . Theorem 1

Let X be a binomial random variable with parameters n and p . Then forthe estimators in family (5), there does not exist a shrinkage parameter c such that E [˜ θ ( c )] = θ when p = 0 . , or equivalently, when θ = 2 . In view of the above comparison, we will focus on the shrinkage estimators in family86) in this paper. Speciﬁcally in Section 3, we show that the Fattorini estimator ˆ θ (1)is still suboptimal, and then propose the optimal estimation of the inverse proportionto further advance the literature. θ Although the Fattorini estimator has some nice properties, it may not provide anaccurate estimate for θ when p is small. To illustrate this point, we consider n = 10and p = 0 .

01, which yields the relative bias of the Fattorini estimator as large as E [ˆ θ (1)] − θθ × − (1 − . × ≈ − . . This indicates that the Fattorini estimator ˆ θ (1) may still be suboptimal and needs to befurther improved. In addition by (12), the expected value of the Fattorini estimator isalways lower than θ ; that is, the shrinkage estimator (6) with c = 1 is always negativelybiased. Inspired by this, we propose to probe into the whole family of estimators (6)and ﬁnd the optimal shrinkage parameter by solving the equation E [ˆ θ ( c )] = θ . For ease of notation, we denote the expected value of ˆ θ ( c ) as g ( c ) = E [ˆ θ ( c )] = n X x =0 (cid:18) n + cx + c (cid:19) (cid:18) nx (cid:19) p x (1 − p ) n − x . (13)To assess the eﬀect of the shrinkage parameter c on the unbiasedness of the estima-tor, we then treat g ( c ) as a function of the shrinkage parameter c and explore itsfundamental properties including the continuity, monotonicity and convexity. Theorem 2

For the expected value function g ( c ) in (13) with any ﬁnite integer n , wehave the following properties: (i) g ( c ) is a continuous function of c on (0 , ∞ ) with c → g ( c ) = ∞ and lim c →∞ g ( c ) = 1 ; (ii) g ( c ) is a strictly decreasing function of c on (0 , ∞ ) ; and (iii) g ( c ) is a strictly convex function of c on (0 , ∞ ) . The proof of Theorem 2 is given in Appendix B. Note that the inverse proportion θ takes value on (1 , ∞ ) and also g (1) < θ from formula (12). Then by Theorem 2 andthe Intermediate Value Theorem, there exists a unique solution c ∈ (0 ,

1) such that g ( c ) = θ , or equivalently, g ( c ) = n X x =0 (cid:18) n + cx + c (cid:19) (cid:18) nx (cid:19) p x (1 − p ) n − x = 1 p . (14)When n is small, in particular for n = 1 or n = 2, we can derive the explicit solutionof c from equation (14). When n is large, since the degree of equation as a functionof c is with n + 1, there may not have an explicit solution for c in mathematics. Tosummarize, we have the following theorem with the proof in Appendix C. Theorem 3

When n is less than 3, the solution of c in equation (14) is given by c n =  p n = 1 ,p − . p . − ( p − . n = 2 . When n ≥ , we have the approximate solution of c as c n ≈ − p − (1 − p ) n +1 ( n + 1)(1 + D ) D − D , (15) where D = 1 p ( n + 1) [1 − (1 − p ) n +1 ] ,D = 1 p ( n + 1)( n + 2) [1 − (1 − p ) n +2 − ( n + 2) p (1 − p ) n +1 ] . To check the accuracy of the approximate solution in Theorem 3, we also plot the10umerical results of the true and approximate solutions of c as a function of p in Figure1. Under various settings, we note that the true solution of c is given as a monotonicallyincreasing function of p with the upper bound 1. And in addition, our approximatesolution always works well as long as n or p is not extremely small. . . . . . . n=10 p . . . . . . n=25 p . . . . . n=50 p . . . . . n=100 p Figure 1: The true and approximate solutions of c with n =10, 25, 50 or 100. Theempty circles represent the values of the true solution, and the solid lines represent thevalues of the approximate solution. 11 .2 Plug-in estimator To apply Theorem 3 for the optimal shrinkage parameter, we need a plug-in estimatorfor the unknown p . Intuitively, the MLE of p , ˆ p MLE = X/n , can serve as a naturalchoice. By doing so, however, for n = 1 we have ˆ c = ˆ p MLE = X and further it yieldsthat ˆ θ (ˆ c ) = (1 + ˆ c ) / ( X + ˆ c ) = (1 + X ) / X , which then suﬀers from the zero-eventproblem. For n = 2, it is noted that the same problem also remains. While for n ≥ n + 1)(1 + D ) D − D , in (15) will be zerowhen X = n , and consequently the approximate solution is still not be applicable.To conclude, the MLE of p cannot be directly applied as the plug-in estimator whenapplying Theorem 3 to estimate the inverse proportion.To overcome the boundary problems on both sides, we consider the plug-in estima-tor of p with the following structure:˜ p plug ( α ) = min(max(ˆ p MLE , α ) , − α ) , where 0 < α ≤ . p plug ( α ) as the plug-inestimator of p , we let ˜ c n ( α ) be the estimator of c n in Theorem 3. To determine thebest threshold value for practical use, we take several diﬀerent α and then compute therelative bias of the estimator, θ − E [ˆ θ (˜ c n ( α ))] −

1, for numerical evaluation. Speciﬁcallyin Figure 2, we plot the relative biases of the estimator as functions of θ for α = 0 . n = 1, 2, 10, 50. While for comparison, the relative biases of theFattorini estimator are also presented in Figure 2.In the top two panels of Figure 2, it is evident that a small threshold value, say α = 0 . n is extremely small. Note also that ˜ p plug ( α ) = 0 . α = 0 .

5. Then by Figure 1that c n is always close to 1 when p = 0 .

5, the resulting estimator of θ with α = 0 . n is large. And for moderatesample sizes, say n = 10 and n = 50, the bottom two panels of Figure 2 show that the12 − . − . . . n=1 q − . − . . . . n=2 q . . . n=10 q − . − . − . . n=50 R e l a t i v e b i a s q Figure 2: The relative biases of ˆ θ (˜ c n ) with α = 0.1, 0.2, 0.3, 0.4 or 0.5. “1” representsthe relative biases associated with α = 0.1, “2” represents the relative biases associatedwith α = 0.2, “3” represents the relative biases associated with α = 0.3, “4” representsthe relative biases associated with α = 0.4, and “5” represents the relative biasesassociated with α = 0.5. And for comparison, “0” represents the relative biases of theFattorini estimator.best value of α should be neither too small or too large. Taken together, we recommendto apply α n = 1 / (2 + ln( n )) as the adaptive threshold value, which follows a decreasingtrend, say, for example, α = 0 . α = 0 . α = 0 .

15, and α = 0 .

11. Then with˜ p plug ( α n ) = min(max(ˆ p MLE , α n ) , − α n ) as the plug-in estimator, our ﬁnal estimator of13he inverse proportion is given by ˆ θ (˜ c n ) = n + ˜ c n X + ˜ c n , (16)where ˜ c n = c n (˜ p plug ( α n )) is the estimator of c n given in Theorem 3. In this section, we conduct simulation studies to evaluate the ﬁnite sample performanceof our new estimator of the inverse proportion. As a baseline for comparison, we alsopropose a piecewise estimator of θ asˆ θ PE = nX + I ( X = 0) , (17)where I ( · ) is the indicator function. In essence, the piecewise estimator (17) is a hard-thresholding version of the MLE; and with X = 0 replaced by X = 1, the zero-eventproblem in the MLE of θ will no longer exist. Moreover, the Fattorini estimator ˆ θ (1)and the Haldane estimator are also included in the simulations for assessing how muchimprovement our new estimators can achieve.To generate the simulation data, we let θ range from 1.02 up to 50, which is equiv-alent to p ranging from 0.98 down to 0.02. We also consider n = 1, 2, 10 and 50as four diﬀerent sample sizes. Then for each combination of θ and n , we generate N = 10 ,

000 data sets from the binomial distribution, estimate θ by the four estima-tors, and compute their empirical relative biases and mean squared errors (MSEs) asfollows: Bias[ˆ θ k ] = 1 N N X k =1 ˆ θ k θ − ! , MSE[ˆ θ k ] = 1 N N X k =1 ˆ θ k θ − ! , θ k is a generic notation for the k th estimate from each of the four estimators.For illustration and comparison, we present the simulation results of the relative biasesand MSEs for n = 1 and 2 in Figure 3, and for n = 10 and 50 in Figure 4.Figure 3: Comparison of the four estimators with n = 1 or 2. The solid circlesrepresent the simulation results of our new estimator ˆ θ (˜ c n ), the empty circles representthe simulation results of the Fattorini estimator ˆ θ (1), the empty rectangles representthe simulation results of the Haldane estimator, and the empty triangles represent thesimulation results of the piecewise estimator ˆ θ PE .The results with n = 1 or n = 2 in Figure 3 show that the new estimator ˆ θ (˜ c n ) per-15igure 4: Comparison of the four estimators with n = 10 or 50. The solid circlesrepresent the simulation results of our new estimator ˆ θ (˜ c n ), the empty circles representthe simulation results of the Fattorini estimator ˆ θ (1), the empty rectangles representthe simulation results of the Haldane estimator, and the empty triangles represent thesimulation results of the piecewise estimator ˆ θ PE .forms better than the Fattorini estimator and the piecewise estimator in both relativebiases and MSEs in most settings, as long as θ is not very small. The Haldance estima-tor is better than the other three estimators for large θ , while it performs poorly when θ is close to 1. In addition, the Fattorini estimator is always better than the piecewiseestimator. From the top two panels in Figure 4 with n = 10, it is also evident that16he new estimator ˆ θ (˜ c n ) provides the most reliable estimates for the inverse proportionin most settings. In particular, we note that the piecewise estimator ˆ θ PE and the Hal-dane estimator fail to provide a stable performance; the new estimator ˆ θ (˜ c n ) alwaysprovides a smaller relative bias than the Fattorini estimator as long as θ is not closeto 1; and the new estimator ˆ θ (˜ c n ) is also able to provide a MSE that is comparable tothe Fattorini estimator along with diﬀerent values of θ . From the bottom two panelsin Figure 4 with n = 50, it is shown that our new estimator ˆ θ (˜ c n ) and the Fattoriniestimator perform nearly the same, which coincides the ﬁndings in Figure 1 that theoptimal value of c is close to 1 when n is large. And once again, we note that thepiecewise estimator and the Haldane estimator do not provide a stable estimation.To conclude, the new estimator ˆ θ (˜ c n ) is better than, or at least as good as, theFattorini estimator in most settings, in particular when n is small. Moreover, thepiecewise estimator and the Haldane estimator fail to provide a stable performance,although they perform well in some limited settings. Our new estimator ˆ θ (˜ c n ) canserve as a reliable estimator of the inverse proportion for practical use. For ease ofcomparison, we also provide the summarized description for the four estimators inTable 1. Estimators Applicability

The optimal estimator Recommended no matter whether n is largeor small.The Fattorini estimator Recommended when n is large.The Haldane estimator Not recommended when n is large.The piecewise estimator Not recommended.Table 1: Comparison among the optimal estimator, the Fattorini estimator, the Hal-dane estimator and the piecewise estimator.17 An application to zero-event studies

In this section, we apply our new estimator into a recent meta-analysis on COVID-19data with zero-event studies. Chu et al. (2020) carried out an excellent review toinvestigate eﬀects of physical distancing, face masks and eye protection on the infec-tion of severe acute respiratory syndrome (SARS), Middle East respiratory syndrome(MERS) and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Thissystematic review was published in June 2020 and is now attracting more and moreattention, for example in Google Scholar as of 25 August 2020, their paper has alreadyreceived a total of 247 citations. Also as commented by MacIntyre and Wang (2020),this systematic review provides a landmark for people to be aware of the importanceof physical distancing and face protection. In particular for physical distancing, theyapplied the relative risks as eﬀect sizes and concluded that the virus transmission issigniﬁcantly reduced with a further distance.In the top panel of Figure 5, seven studies were included in their meta-analysisof physical distancing for COVID-19 data, where six studies therein suﬀered from thezero-event problem. For the four single-zero-event studies, the 0.5 continuity correctionwas added to all the counts of events, while for the two double-zero-event studies, theywere not included in the meta-analysis. By Xu et al. (2020) and our simulation results,adding the 0.5 continuity correction is suboptimal. Moreover, Xu et al. (2020) alsoshowed that the double-zero-event studies may also be informative, and so excludingthem can be questionable and/or even alter the results. In view of the above limitations,we have re-conducted the meta-analysis on COVID-19 data that also include the twodouble-zero-event studies. Speciﬁcally by applying our new estimator in (16), therelative risks are estimated by c RR(˜ c n ) = ( X + ˜ c n )( n + ˜ c n )( X + ˜ c n )( n + ˜ c n ) , (18)where ˜ c n and ˜ c n are the estimates of the optimal shrinkage parameter for the exposedgroup and the unexposed group, respectively. While for comparison, we also conduct18 meta-analysis for all seven studies by the 0.5 continuity correction, and then presentall the forest plots in Figure 5.Figure 5: Forest plots on the relative risk between physical distancing and infection forCOVID-19 data.From the middle and bottom panels of Figure 5, it is evident that the new meta-analytical results with the double-zero-event studies also support the claim that afurther distance will reduce the virus infection. On the other hand, the evidence19ecomes less signiﬁcant as the combined relative risks get larger. Moreover, by com-paring the two forest plots that both include the double-zero-event studies, we alsonote that our new estimator in (18) is able to yield a larger combined relative risk witha narrower conﬁdence interval. By the variance function of ln( c RR) in (9), the 0.5 con-tinuity correction may lead to a large estimate of the relative risk after the exponentialtransformation, especially when the zero-event problem occurs. Hence, the conﬁdenceintervals of the relative risks in the two double-zero-event studies are very wide, whichcan indicate that there may exist high uncertainty in the interval estimation. In con-trast, by applying our new estimator of the inverse proportion, the conﬁdence intervalsfor the double-zero-event studies will be much narrower.

The binomial proportion is a classic parameter originated from the binomial distri-bution, which has been well studied in the literature because of its wide range ofapplications. In contrast, the reciprocal of the binomial proportion, also known asthe inverse proportion, is often overlooked, although it also plays an important rolein various ﬁelds including clinical studies and random sampling. However, it is knownthat the MLE of the inverse proportion suﬀers from the zero-event problem. To over-come this problem, there have been existing estimators in the literature for the inverseproportion including, for example, the Haldane estimator and the Fattorini estimator.To further advance the literature, we ﬁrst introduced two motivating exampleswhere an accurate estimate of the inverse proportion is desired. We then comparedtwo shrinkage families of estimators and ﬁgured out the family with better statisticalproperties. Finally, we proposed a new estimator of the inverse proportion by derivingthe optimal shrinkage parameter c in the family of estimators (6). To be more speciﬁc,we derived the explicit formula for the optimal c in Theorem 3 for n = 1 or 2, and anapproximate formula for the optimal c for n ≥

3. Further to estimate the unknown p inthe formula of the optimal shrinkage parameter, a plug-in estimator was also introduced20nd that also overcame the boundary problem of p . Simulation studies showed thatour new estimator performs better than, or as well as, the existing competitors in mostpractical settings, and it can thus be recommended to estimate the inverse proportionfor practical application. Finally, we also applied our new estimator to a recent meta-analysis on COVID-19 data with the zero-event problem, and it yielded more reliableresults for the scientiﬁc question how physical distancing can eﬀectively prevent theinfection of the new coronavirus.To conclude the paper, we have made a good eﬀort in ﬁnding the optimal estimatorfor the inverse proportion related to the binomial distribution. According to Gupta(1967), there does not exist an unbiased estimator for the inverse proportion θ . To ver-ify this result, by the proof-by-contradiction we assume that ˆ θ u = η ( X ) is an unbiasedestimator of θ . Then by deﬁnition, E (ˆ θ u ) = P nx =0 η ( x ) (cid:0) nx (cid:1) p x (1 − p ) n − x = θ. From theleft-hand side, the expected value of ˆ θ u is a polynomial of p with degree n . While forthe right-hand side, by the Taylor expansion we have θ = 1 /p = P ∞ i =0 (1 − p ) i , whichis a polynomial of p with inﬁnite degree. This shows that the unbiasedness cannot beheld for any ﬁnite n . In view of this property, there is probably no uniformly bestestimator for the inverse proportion. Although we have conducted some nice work inthis paper, we believe that more advanced research is still needed to further improvethe estimation accuracy of the inverse proportion. For example, one may consider todevelop a better and more robust approximation for the optimal shrinkage parame-ter when the binomial proportion p is extremely small. In addition, other familiesof shrinkage estimators can also be considered to see whether they can yield betterestimators for the inverse proportion.Last but not least, we note that our new estimation of the inverse proportion canhave many other real applications. For instance, the spirit of our new method may alsobe applied to estimate the number needed to treat (NNT), which is another importantmedical term and was ﬁrst introduced by Laupacis et al. (1988). Speciﬁcally, NNT isdeﬁned as NNT = 1 / ( p − p ), where p is the event probability in the exposed groupand p is the event probability in the unexposed group. Noting also that p − p is21he absolute risk reduction (ARR), NNT can be explained as the average number ofpatients who are needed to be treated to obtain one more patient cured compared witha control in a clinical trial (Hutton, 2000). Nevertheless, the estimation of NNT willbe more challenging than the estimation of the inverse proportion, mainly because theestimate of p − p can be either positive or negative, in addition to the zero-eventproblem in the denominator. More recently, Veroniki et al. (2019) also referred to thissituation as the statistically nonsigniﬁcant result, which may lead to an unexpectedcalculation complication. We expect that our new work in this paper will shed lighton new directions on the NNT estimation, which can be particularly useful in clinicaltrials and evidence-based medicine. References

Agresti, A., and Coull, B. A. (1998). Approximate is better than “exact for intervalestimation of binomial proportions.

The American Statistician

52, 119–126.Altman, D. G. (1998). Conﬁdence intervals for the number needed to treat.

BritishMedical Journal

Statistical Science

16, 101–117.Carter, R. E., Lin, Y., Lipsitz, S. R., Newcombe, R. G., and Hermayer, K. L. (2010).Relative risk estimated from the ratio of two median unbiased estimates.

Journal ofthe Royal Statistical Society: Series C

59, 657–671.Casella, G., and Berger, R. L. (2002).

Statistical Inference . Paciﬁc Grove: Duxbury.Chao, M. T., and Strawderman, W. E. (1972). Negative moments of positive randomvariables.

Journal of the American Statistical Association

67, 429–431.Chu, D. K., Akl, E. A., Duda, S., Solo, K., Yaacoub, S., Schnemann, H. J., & COVID-19 Systematic Urgent Review Group Eﬀort (SURGE) study authors (2020). Physicaldistancing, face masks, and eye protection to prevent person-to-person transmissionof SARS-CoV-2 and COVID-19: a systematic review and meta-analysis.

Lancet

Sampling Techniques . New York: John Wiley & Sons.Fattorini, L. (2006). Applying the Horvitz-Thompson criterion in complex designs: acomputer-intensive perspective for estimating inclusion probabilities.

Biometrika

Studia Ekonomiczne

Biometrika

54, 181–187.Good, I. J. (1980). Some history of the hierarchical Bayesian methodology. In J. M.Bernardo, M. H. DeGroot, D. V. Lindley, and A. F. M. Smith (Eds.),

BayesianStatistics , 489–519. Valencia: University Press.Goodman, L. A. (1964). Interactions in multidimensional contingency tables.

Annalsof Mathematical Statistics

35, 832–646.Gupta, M. K. (1967). Unbiased estimate for 1 /p . Annals of the Institute of StatisticalMathematics

19, 413–416.Haldane, J. B. S. (1956). The estimation and signiﬁcance of the logarithm of a ratioof frequencies.

Annals of human genetics

20, 309–311.Hanley, J. A., and Lippman-Hand, A. (1983). If nothing goes wrong, is everything allright? Interpreting zero numerators.

Journal of the American Medical Association

Introduction to MathematicalStatistics . Boston: Pearson Education.Horvitz, D. G., and Thompson, D. J. (1952). A generalization of sampling withoutreplacement from a ﬁnite universe.

Journal of the American Statistical Association

47, 663–685.Hutton, J. L. (2000). Number needed to treat: properties and problems.

Journal ofthe Royal Statistical Society: Series A

The AmericanStatistician

51, 137–139.Laupacis, A., Sackett, D. L., and Roberts, R. S. (1988). An assessment of clinicallyuseful measures of the consequences of treatment.

New England Journal of Medicine

Lancet

Statistical Models for Proportions and Probabilities . Heidelberg:Springer.Tuyl, F., Gerlach, R., and Mengersen, K. (2009). The rule of three, its variants andextensions.

International Statistical Review

77, 266–275.Veroniki, A. A., Bender, R., Glasziou, P., Straus, S. E., and Tricco, A. C. (2019).The number needed to treat in pairwise and network meta-analysis and its graphicalrepresentation.

Journal of Clinical Epidemiology

Journal of the American Statistical Association

22, 209–212.Xu, C., Li, L., Lin, L., Chu, H., Thabane, L., Zou, K., and Sun, X. (2020). Exclusionof studies with no events in both arms in meta-analysis impacted the conclusions.

Journal of Clinical Epidemiology. ppendix A: Proof of Theorem 1

Proof . Assume that there exists a value c > E [˜ θ ( c )] = θ . When p = 0 . E [˜ θ ( c )] = n X x =0 (cid:18) n + 2 cx + c (cid:19) (cid:18) nx (cid:19) p x (1 − p ) n − x = 12 n h ( c ) , where h ( c ) = n X x =0 (cid:18) n + 2 cx + c (cid:19) (cid:18) nx (cid:19) . Hence to show that the estimator is unbiased for θ = p − = 2, it is equivalent to showthat there exists a value c > h ( c ) = 2 n +1 .The ﬁrst derivative of h ( c ) is h ′ ( c ) = n X x =0 x − n ( x + c ) (cid:18) nx (cid:19) . (19)When n is an even number, by noting that (cid:0) nx (cid:1) = (cid:0) nn − x (cid:1) , we can rewrite the ﬁrstderivative as h ′ ( c ) = n/ − X x =0 (cid:20) x − n ( x + c ) (cid:18) nx (cid:19) + 2( n − x ) − n ( n − x + c ) (cid:18) nn − x (cid:19)(cid:21) = n/ − X x =0 (cid:20) x − n ( x + c ) (cid:18) nx (cid:19) + n − x ( n − x + c ) (cid:18) nx (cid:19)(cid:21) , where the term with x = n/ x = 0 , . . . , n/ −

1, we have n − x > x and further2 x − n ( x + c ) (cid:18) nx (cid:19) + n − x ( n − x + c ) (cid:18) nx (cid:19) < x − n ( x + c ) (cid:18) nx (cid:19) + n − x ( x + c ) (cid:18) nx (cid:19) = 0 . This shows that h ′ ( c ) < c >

0. When n is an odd number, we can write the25rst derivative of h ( c ) as h ′ ( c ) = ( n − / X x =0 (cid:20) x − n ( x + c ) (cid:18) nx (cid:19) + 2( n − x ) − n ( n − x + c ) (cid:18) nn − x (cid:19)(cid:21) . And similarly, we can show that h ′ ( c ) < c >

0. Combining the above results, h ( c ) is a strictly decreasing function of c on (0 , ∞ ).In addition, for any ﬁnite n we note thatlim c →∞ h ( c ) = n X x =0 lim c →∞ (cid:18) n + 2 cx + c (cid:19) (cid:18) nx (cid:19) = 2 n X x =0 (cid:18) nx (cid:19) = 2 n +1 . This shows that there does not exist a ﬁnite value of c > h ( c ) = 2 n +1 , andso Theorem 1 holds. Appendix B: Proof of Theorem 2

Proof . To prove (i), we note that ( n + c ) / ( x + c ) is a rational function of c and sois always continuous on the domain of (0 , ∞ ). Now since n is also ﬁnite, g ( c ) is acontinuous function of c on (0 , ∞ ). Also for the limit of g ( c ),lim c → g ( c ) = lim c → " n X x =0 (cid:18) n + cx + c (cid:19) (cid:18) nx (cid:19) p x (1 − p ) n − x = (cid:18) lim c → n + cc (cid:19) (1 − p ) n + n X x =1 (cid:20)(cid:18) lim c → n + cx + c (cid:19) (cid:18) nx (cid:19) p x (1 − p ) n − x (cid:21) = ∞ , lim c →∞ g ( c ) = lim c →∞ " n X x =0 (cid:18) n + cx + c (cid:19) (cid:18) nx (cid:19) p x (1 − p ) n − x = n X x =0 (cid:20)(cid:18) lim c →∞ n + cx + c (cid:19) (cid:18) nx (cid:19) p x (1 − p ) n − x (cid:21) = 1 .

26o prove (ii), we verify that the ﬁrst derivative of g ( c ) g ′ ( c ) = n − X x =0 x − n ( x + c ) (cid:18) nx (cid:19) p x (1 − p ) n − x < . Hence, g ( c ) is a strictly decreasing functon of c on (0 , ∞ ).To proof (iii), we show that the second derivative of g ( c ) g ′′ ( c ) = n − X x =0 n − x )( x + c ) (cid:18) nx (cid:19) p x (1 − p ) n − x > . As a consequence, g ( c ) is a strictly convex function of c on (0 , ∞ ). Appendix C: Proof of Theorem 3

Proof . When n = 1, equation (14) becomes (cid:18) cc (cid:19) (1 − p ) + p = 1 p , from which we obtain c = p .When n = 2, it is necessary to solve (cid:18) cc (cid:19) (1 − p ) + 2 (cid:18) c c (cid:19) p (1 − p ) + p = 1 p . After factorizing this equation, we have c + (1 − p ) c − p (1 − p ) = 0 . The solutions are c = p − . ± p . − ( p − . . To remain a positive value of theestimator, the value of c is required to be positive, so c = p − . p . − ( p − . .To get the solution of c when n ≥

3, we apply the Taylor expansion of 1 / ( X + c )27round c = 1 and it yields that1 X + c = 1 X + 1 − c − X + 1) + O (( c − ) . (20)By (13) and (20), for any ﬁnite n we have g ( c ) = E (cid:18) n + cX + 1 (cid:19) − E (cid:20) ( n + c )( c − X + 1) (cid:21) + O (( c − )= E (cid:18) n + cX + 1 (cid:19) − E (cid:20) ( n + 1)( c − X + 1) + ( c − ( X + 1) (cid:21) + O (( c − )= E (cid:18) n + cX + 1 (cid:19) − E (cid:20) ( n + 1)( c − X + 1) (cid:21) + O (( c − ) . (21)Let D = E [1 / ( X + 1)] and D = E [1 / ( X + 1)( X + 2)]. For D , we have D = 1 n + 1 n X x =0 n + 1 x + 1 (cid:18) nx (cid:19) p x (1 − p ) n − x = 1 p ( n + 1) n X x =0 ( n + 1)!( x + 1)!( n − x )! p x +1 (1 − p ) n +1 − ( x +1) = 1 p ( n + 1) n +1 X s =1 (cid:18) n + 1 s (cid:19) p s (1 − p ) n +1 − s = 1 p ( n + 1) [1 − (1 − p ) n +1 ] , where s = x + 1. And for D , we have D = 1( n + 1)( n + 2) n X x =0 ( n + 1)( n + 2)( x + 1)( x + 2) (cid:18) nx (cid:19) p x (1 − p ) n − x = 1 p ( n + 1)( n + 2) n X x =0 ( n + 2)!( x + 2)!( n − x )! p x +2 (1 − p ) n +2 − ( x +2) = 1 p ( n + 1)( n + 2) [1 − (1 − p ) n +2 − ( n + 2) p (1 − p ) n +1 ] . Now with D and D , to derive the solution of c , we take the approximation E [1 / ( X +1) ] ≈ (1 + D ) D and also ignore the remainder term O (( c − ) in (20). Then28onsequently, we have the approximate equation as ( n + c ) D − ( n +1)( c − D ) D ≈ /p , which yields the approximate solution of c as c n ≈ − /p − ( n + 1) D ( n + 1)(1 + D ) D − D = 1 − p − (1 − p ) n +1 ( n + 1)(1 + D ) D − D ..