[PDF] A confidence interval robust to publication bias for random-effects meta-analysis of few studies

Abstract

Systematic reviews aim to summarize all the available evidence relevant to a particular research question. If appropriate, the data from identified studies are quantitatively combined in a meta-analysis. Often only few studies regarding a particular research question exist. In these settings the estimation of the between-study heterogeneity is challenging. Furthermore, the assessment of publication bias is difficult as standard methods such as visual inspection or formal hypothesis tests in funnel plots do not provide adequate guidance. Previously, Henmi and Copas (Statistics in Medicine 2010, 29: 2969--2983) proposed a confidence interval for the overall effect in random-effects meta-analysis that is robust to publication bias to some extent. As is evident from their simulations, the confidence intervals have improved coverage compared with standard methods. To our knowledge, the properties of their method has never been assessed for meta-analyses including fewer than five studies. In this manuscript, we propose a variation of the method by Henmi and Copas employing an improved estimator of the between-study heterogeneity, in particular when dealing with few studies only. In a simulation study, the proposed method is compared to several competitors. Overall, we found that our method outperforms the others in terms of coverage probabilities. In particular, an improvement compared with the proposal by Henmi and Copas is demonstrated. The work is motivated and illustrated by a systematic review and meta-analysis in paediatric immunosuppression following liver transplantations.

Full PDF

aa r X i v : . [ s t a t . M E ] J u l A conﬁdence interval robust to publication bias forrandom-eﬀects meta-analysis of few studies

M. Henmi , S. Hattori , T. Friede ∗ Institute of Statistical Mathematics, 10-3 Midori-cho, Tachikawa, Tokyo 190-8562, Japan Department of Biomedical Statistics, Graduate School of Medicine, Osaka University, Osaka 565-0871, Japan Department of Medical Statistics, University Medical Center G¨ottingen, G¨ottingen, Germany

Abstract

Systematic reviews aim to summarize all the available evidence relevant to a par-ticular research question. If appropriate, the data from identiﬁed studies are quanti-tatively combined in a meta-analysis. Often only few studies regarding a particularresearch question exist. In these settings the estimation of the between-study het-erogeneity is challenging. Furthermore, the assessment of publication bias is diﬃcultas standard methods such as visual inspection or formal hypothesis tests in funnelplots do not provide adequate guidance. Previously, Henmi and Copas (Statistics inMedicine 2010, 29: 2969–2983) proposed a conﬁdence interval for the overall eﬀectin random-eﬀects meta-analysis that is robust to publication bias to some extent.As is evident from their simulations, the conﬁdence intervals have improved coveragecompared with standard methods. To our knowledge, the properties of their methodhas never been assessed for meta-analyses including fewer than ﬁve studies. In thismanuscript, we propose a variation of the method by Henmi and Copas employingan improved estimator of the between-study heterogeneity, in particular when dealingwith few studies only. In a simulation study, the proposed method is compared to sev-eral competitors. Overall, we found that our method outperforms the others in termsof coverage probabilities. In particular, an improvement compared with the proposalby Henmi and Copas is demonstrated. The work is motivated and illustrated by asystematic review and meta-analysis in paediatric immunosuppression following livertransplantations.

Keywords : Meta-analysis; publication bias; between-trial heterogeneity; conﬁdenceinterval; coverage probability

Systematic reviews aim to summarize all the available evidence relevant to a particularresearch question. If appropriate, the data from identiﬁed studies are quantitatively com-bined in a meta-analysis. If the true eﬀect is the same in all studies to be combinedin a meta-analysis, then the so-called common-eﬀect or ﬁxed-eﬀect model is appropriate.In practical applications this assumption often appears to be too strict as some level ofbetween-trial heterogeneity in the eﬀects is suspected. Then the random-eﬀects model is ∗ correspondence to : Tim Friede, Department of Medical Statistics, University Medical CenterG¨ottingen, Germany; email: [email protected] t -quantileshave been proposed [5, 6, 7, 8]. With few studies only, however, they are often conservativeand so long that they are uninformative [9]. Also various likelihood-based methods haverecently been assessed in the speciﬁc situation of few studies and not found to be a solutionto the problem [10]. Between-trial heterogeneity estimates mentioned above often result inzero [4], with the notable exception of the method proposed by Chung et al [3]. Chung et alsuggested the so-called Bayes modal (BM) estimator, which uses in a Bayesian framework aweakly informative prior for the between-trial heterogeneity to avoid zero estimates of theheterogeneity. Furthermore, a fully Bayesian approach to random-eﬀects meta-analysiswith weakly informative priors for the between-trial heterogeneity parameter has someadvantages in this situation, since zero estimates are avoided as with the BM estimatorand in addtition the uncertainty in estimating the heterogeneity is accounted for [4, 11].Of course the Bayesian credible intervals would not necessarily have frequentist properties.Evaluating the operating characteristics in extensive simulation studies, it was found thatthe frequentist coverage probabilities are often above the nominal level with conservativechoices of the prior for the between-trial heterogeneity [4, 11]. Bender et al[12] recentlyprovided an overview on the topic of meta-analyses with few studies.In systematic reviews, relevant evidence is identiﬁed through systematic searches ofliterature databases. If all relevant studies would be published, this would be suﬃcient.However, this is not always the case. The problem was ﬁrst described as the ’ﬁle drawerproblem’ [13]. Today various types of reporting biases are carefully distinguished includingpublication bias, time lag bias, citation bias and outcome reporting bias to name but afew. Studies might not be published at all for various reasons or only with a certain delayor in journals or languages that are more diﬃcult to access (see e.g. Table 7.2.a in [14]).In the following we focus here on the aspect of publication bias. Prospective registrationof clinical trials is one way to tackle this problem. It has become standard practice tosearch not only at least two electronic databases of the literature but also to search at leastone registry for clinical studies such as clinicaltrials.gov. The idea would be to includeunpublished studies in systematic reviews. However, access to the unpublished results isoften challenging as it requires the cooperation of investigators, sponsors etc.A number of methods have been proposed over the years to deal with publication bias[15]. A popular way to interrogate data for publication bias is the visualization in formof a so-called funnel plot. In this scatter plot, each study contributes an estimate of aneﬀect measure and its estimated standard error. The former is plotted on the x-axis, thelatter on the y-axis. If no publication bias is present, we would expect the plot to besymmetric to the vertical line running through the average eﬀect. Any absence of thissymmetry might be interpreted as a signal that some form of reporting bias might bepresent. As this can be diﬃcult to judge, formal hypothesis tests have been proposed(see e.g. [16]). The problem with the visual inspection as well as with the formal tests2s that they become more powerful with larger number of studies, but are less sensitivewith few studies only. In the context of funnel plots, trim-and-ﬁll methods have beenproposed to correct the overall eﬀect for potential publication bias [17]. Following analternative approach, several sensitivity analysis methods have been suggested based onselection functions describing the selective publication process [18, 19, 20]. For instance,Copas and Jackson [19] investigated the maximum bias over all possible selection functionswhich satisfy the (fairly weak) condition that studies with smaller standard errors are atleast as likely to be selected than studies with larger standard errors. Building on theirwork, Henmi et al [20] developed sensitivity analyses that, in contrast to the proposal byCopas and Jackson [19], account for uncertainty in estimation. Again, the methods arenot designed for the setting of few studies only.Our work is motivated by a systematic review and meta-analysis of controlled clinicaltrials assessing eﬃcacy and safety of Interleukin-2 receptor antagonists (IL-2RA) in chil-dren having undergone liver transplantations [21], a rare surgical procedure in children.In total only six relevant studies were identiﬁed with little standardization with regardto the design of the studies implying some level of heterogeneity. Although the authorscarefully checked for publication bias using standard techniques, it cannot be excludedthat in particular some smaller studies were not published if they resulted in inconclusivetreatment eﬀects.In contrast to the approaches to publication bias described above, Henmi and Copas[22] proposed a method for random-eﬀects meta-analysis that is robust to the selection ofstudies. They modiﬁed the DerSimonian-Laird (DL) conﬁdence interval ([2], (10) in [22])by replacing the random-eﬀect estimator by the ﬁxed-eﬀect estimator of the overall eﬀectand by replacing the normal quantiles by more accurate ones. The latter depends on thebetween-trial heterogeneity. The DL estimator is used in the computation of the quantiles.Therefore, with few studies this approach may not work well. In this paper, we proposea modiﬁcation of the Henmi-Copas method by replacing the estimator of the between-study heterogeneity in the computation of the quantiles by the one developed by Chunget al [3]. The properties of the new approach are assessed and compared to alternativemethods including the Henmi-Copas approach and a proposal by Doi et al [23] in MonteCarlo simulation studies considering in particular the case of few studies with and withoutpublication bias. Our method is not conditional on having detected publication bias, e.g.in a funnel plot, since this would be very diﬃcult with only few studies included in themeta-analysis. But it is robust to the selection of studies even with few studies as we willsee below.The manuscript is organized as follows. In the next section the new conﬁdence intervalfor the overall eﬀect is developed starting by introducing notation and reviewing themethod of Henmi and Copas [22]. The simulation study assessing the properties of thenew conﬁdence interval in comparison to existing methods is presented in Section 3. InSection 4 the proposed method is applied to the motivating example. We close with abrief discussion of our ﬁndings and their limitations. Adopting the notation by Henmi and Copas [22], the true eﬀect of an individual study i out of n independent studies is denoted by θ i . Estimates y i of the eﬀects θ i are observedwith stand errors σ i . Here we consider the normal-normal hierarchical model (NNHM) ,which is the standard model for random-eﬀects meta-analysis. In the NNHM , it is assumed3hat the θ i are from a normal distribution with expectation θ and variance τ , i.e. θ i | θ, τ ∼ N ( θ, τ ) , i = 1 , . . . , n. (1)Furthermore, the eﬀect estimators Y i follow (at least approximately) a normal distributionwith expectation θ i and variance σ i , i.e. Y i | θ i ∼ N ( θ i , σ i ) , i = 1 , . . . , n. (2)From Equations (1) and (2) they follow the marginal model Y i | θ, τ ∼ N ( θ, σ i + τ ) , i = 1 , . . . , n. (3)If the between-trial heterogeneity τ is 0, then the random-eﬀects model reduces to theso-called ﬁxed-eﬀect or common-eﬀect model.The focus of our study is inference regarding θ , the overall eﬀect. A standard method toconstruct an estimator and a (1 − α ) conﬁdence interval for θ was proposed by DerSimonianand Laird [2] (DL). In short, the DL estimator of θ is given byˆ θ R = X ˆ w i Y i / X ˆ w i , (4)where ˆ w i = 1 / ( σ i + ˆ τ DL ). Here, the DL estimator ˆ τ DL of the between-study heterogeneity τ is given by ˆ τ DL = max ( , Q − ( n − P w i − P w i / P w i ) . (5)The weights w i are the ﬁxed eﬀect weights (with τ = 0), which are w i = 1 /σ i . Further-more, Q is the so-called Q -statistic deﬁned by Q = X w i ( Y i − ˆ θ F ) , (6)where ˆ θ F is the ﬁxed (or common) eﬀect estimator of the overall eﬀect withˆ θ F = X w i Y i / X w i . (7)If the estimator ˆ τ DL is assumed to be a ﬁxed constant as the true value of τ , then it holdsthat Z = ˆ θ R − θ / √ P ˆ w i ∼ N (0 , . (8)This results in the DerSimonian-Laird (1 − α )% conﬁdence interval (DL) for θ , which isgiven by (cid:18) ˆ θ R − z (1 − α ) / √ P ˆ w i , ˆ θ R + z (1 − α ) / √ P ˆ w i (cid:19) , (9)where z γ is the γ quantile of the standard normal distribution. The assumption thatthe estimate ˆ τ DL is the true value of τ might be reasonable when the between-studyheterogeneity can be estimated with high precision, i.e. when the number of studiesincluded in the meta-analysis is large. In medical applications, however, this is frequentlynot the case. As noted by several authors, the application of the DL approach in meta-analyses with small to moderate numbers of studies results in coverage probabilities belowthe nominal level 1 − α [4]. 4enmi and Copas [22] tackled the two problems that (a) the distribution of the pivotstatistic is quite diﬀerent from the standard normal distribution when the number ofstudies n is small, and (b) the estimators of θ are biased due to selective publication ofsmaller studies with less favourable results (publication bias). With respect to the latterthey note that the common (or ﬁxed) eﬀect estimator ˆ θ F is more robust to publicationbias than the random-eﬀects estimator ˆ θ R simply because smaller studies, which are lesslikely to be published when their outcome is not favourable, have a smaller weight in theconstruction of ˆ θ F than in ˆ θ R . To address the problem of the normal approximation theyderive the distribution of the pivot statistic based on the ﬁxed eﬀect estimator under therandom-eﬀects model. More speciﬁcally, the variance of ˆ θ F is V ( τ ) = τ P w i + P w i ( P w i ) . (10)The variance V ( τ ) can be estimated by plugging in ˆ τ DL for τ . We denote this estimatorof V ( τ ) by V (ˆ τ DL ) = ˆ τ DL P ˆ w i + P ˆ w i ( P ˆ w i ) . (11)Recall that the weights ˆ w i also depend on ˆ τ DL . Hence, the pivot statistic U is given by U = ˆ θ F − θ q V (ˆ τ DL ) . (12)The point in the derivation of the distribution of U by Henmi and Copas [22] is to takeinto account the random variation of ˆ τ DL in addition to ˆ θ F as follows.The distribution function of U can be written as P ( U ≤ u ) =  − Z ∞ u P (cid:18) Q ≤ f − (cid:18) ru (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) R = r (cid:19) p R ( r ) dr (if u ≥ Z u −∞ P (cid:18) Q ≤ f − (cid:18) ru (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) R = r (cid:19) p R ( r ) dr (if u < , (13)where the random variable R and the function f are deﬁned by R = P w i ( Y i − θ ) √ P w i and f ( Q ) = s P w i { Q − ( n − } ( P w i ) − P w i + 1 , (14)respectively. The function p R ( r ) is the probability density function of R , which is the nor-mal density with mean zero and variance 1+ τ ( P w i / P w i ). The conditional distributionof Q given R , which is necessary to calculate the integral in (13), is a little complicated,but it is well approximated by the gamma distribution whose mean and variance coincidewith the exact conditional mean M ( R ) and variance V ( R ) of Q given R , respectively (see[22] and its Appendix A for the explicit formulas of M ( R ) and V ( R ) and their derivation).Since both of the conditional mean M ( R ) and variance V ( R ) depend on the unknown truevalue of τ as does the variance of R , Henmi and Copas [22] proposed the use of the DLestimator ˆ τ DL for τ again to approximate these quantities. Under this setting, the (ap-proximate) γ quantile u γ of U can be obtained by means of numerical integration andoptimization (see Appendix B in [22] for an implementation in R) and hence a (1 − α )conﬁdence interval for θ is given by 5 ˆ θ F − u α/ q V (ˆ τ DL ) , ˆ θ F + u α/ q V (ˆ τ DL ) (cid:19) . (15)In simulation studies, Henmi and Copas [22] could show that their approach improvescoverage probabilities as compared to standard procedures including the DL approach.With only few studies included in the meta-analysis, however, the performance is notsatisfying. The poor performance of the method in this particular situation is caused (atleast partly) by the use of the DL estimator ˆ τ DL in the computation of the quantiles of thepivot statistic U as above, since ˆ τ DL results frequently in zero estimates with few studiesalthough the between-trial heterogeneity is positive, τ > θ [25, 26]. A number of suggestions have been madeon the choice of such weakly informative priors for τ including half-t [26] and half-normaldistributions [4]. Here we follow Chung et al [3] who proposed to use a gamma distributionwith shape η and rate λ as a prior for τ , speciﬁcally p ( τ ) = λ η τ η − e − λτ / Γ( η ) with gammafunction Γ( η ). This choice means that the logarithm of the posterior of θ and τ is equal tothe log likelihood plus a term depending only on τ but not θ . Rather than using the meanor median of the posterior, Chung et al [3] consider the mode, which can be computedby numerical optimization. This estimator of τ is referred to as the Bayes Modal (BM)estimator ˆ τ BM . As default, Chung et al recommend to use α = 2 and λ close to 0. TheBM estimator ˆ τ BM can be interpreted as a penalized maximum likelihood (ML) estimator[3]. In this paper, we propose to replace the DL estimator ˆ τ DL in the computation ofthe quantiles of the pivot statistic U by the BM estimator ˆ τ BM . The choice of the BMestimator is motivated by its performance in comparison to other estimators in recentsimulation studies (see e.g. Figures 2 and 3 in [4]). The resulting γ quantile is denoted by u ( BM ) γ . The (1 − α ) conﬁdence interval for θ is then given by (cid:18) ˆ θ F − u ( BM ) α/ q V (ˆ τ DL ) , ˆ θ F + u ( BM ) α/ q V (ˆ τ DL ) (cid:19) . (16)In summary, our idea is that we still use the DL estimator ˆ τ DL in the construction ofthe pivot statistic U given in (12) in the same way as Henmi and Copas [22], but we usethe BM estimator ˆ τ BM in the approximate calculation for the distribution of U instead ofˆ τ DL . The reason for the use of the DL estimator ˆ τ DL in the construction of U is that it iseasier to calculate the distribution of the pivot statistic U , taking into account the eﬀectof estimating τ . However, the distribution of U depends on the unknown true value of τ and it is necessary to use some estimate of τ to approximate the distribution of U . Onepossibility is to use the DL estimator ˆ τ DL again, which was done in [22], but it would beinaccurate unless the number of studies are suﬃciently large. Hence, we propose the use ofthe BM estimator ˆ τ BM to improve the accuracy in estimating τ and in approximating thedistribution of U , which we expect to lead the improvement of the coverage probabilities ofthe Henmi-Copas (HC) conﬁdence interval (15). In the next section, by simulation studies,we show that the new conﬁdence interval (16) actually improves the HC conﬁdence interval(15) in coverage probability as well as the DL conﬁdence interval (9) in both cases withand without publication bias, especially when the number of studies is small.6able 1: Summary of the scenarios considered in the simulation study Parameter Values

Treatment eﬀect θ . τ . , . , . n , , , , β = 4, γ = 3Severe publication bias β = 4, γ = 1 . In order to compare the performance of the proposed approach with previously suggestedprocedures a Monte Carlo simulation study was conducted. As comparators the methodsby Henmi and Copas [22] (HC), Chung et al [3] (BM), Doi et al [23] (IVH) and DerSimo-nian and Laird [2] (DL) were included. The ﬁrst one is known to be robust to publicationbias to some extent, but its performance in meta-analyses with few studies only is un-known. The approach by Chung et al [3] was developed for the scenario of few studiesbut might not be robust to publication bias. Doi et al [23] proposed the inverse varianceheterogeneity model. As with the HC approach, the interval is centred around an esti-mator assuming the common-eﬀect model. Therefore, it might have attractive propertiesin settings with publication bias. In contrast to the HC approach, however, it is basedon normal approximation. This approach was not included in recent method compari-son studies [24]. The DL approach was included here as it is often considered to be thestandard approach to random-eﬀects meta-analysis. The simulation model by Brockwelland Gordon [27] formed the basis for our simulation study. It was used in several recentsimulation studies and therefore appeared to be a good choice. To account for publicationbias, we used the same selection function (probability that a study with an outcome y andassociated standard error σ is selected in the meta-analysis) P (selected | y, σ ) = exp (cid:20) − β (cid:26) Φ (cid:18) − yσ (cid:19)(cid:27) γ (cid:21) (17)as in [22] with the same sets of the parameters β and γ for moderate and severe publicationbias. Here, Φ is the cumulative distribution function of the standard normal distribution.Table 1 summarizes the simulation scenarios considered. Per scenario N = 2 ,

000 simula-tion replications were run.Figure 1 presents the simulated coverage probabilities for the diﬀerent conﬁdence inter-vals in the various scenarios. In all scenarios considered, the proposed method performs atleast as well as the HC method in terms of the coverage probability. With larger numberof studies, say n ≥

9, and more pronounced between-trial heterogeneity, say τ ≥ . n = 3or n = 6, and only low levels of between-trial heterogeneity, τ = 0 .

05, the coverage prob-abilities of the BM approach are slightly higher than those of the proposed method. In thescenarios with publication bias, however, the coverage probabilities of the BM approach7apidly decrease well below the nominal level of 0.95 with increasing numbers of studiesincluded in the meta-analysis and increasing levels of between-trial heterogeneity. With-out publication bias, the coverage of the IVH interval is similar to the coverage of the DLinterval, i.e. poor for small numbers of studies n and closer to the nominal level for larger n . In the settings with publication bias the coverage probabilities of the IVH intervalsare generally larger than those of the DL approach, in particular with more pronouncedheterogeneity τ and larger numbers of studies n . However, the coverage probabilities arebelow those achieved by the HC and HC-BM approaches. Overall, the coverage probabil-ities of the proposed approach are closest to the nominal level whereas the coverages forthe DL approach are well below the nominal level for several scenarios characterized bypublication bias and small numbers of studies included in the meta-analysis.In scenarios where diﬀerent methods resulted in similar coverage probabilities closeto the nominal level it is of interest to compare the length of the intervals obtained bythese methods. Shorter intervals with the same coverage would of course be preferred.Table 2 gives the median interval lengths of the diﬀerent conﬁdence intervals for variouslevels of publication bias and heterogeneity τ as well as numbers of studies n includedin the meta-analysis. For instance, in the setting without publication bias and n = 3studies the median length of our proposed conﬁdence interval (HC-BM interval) is 1.12,which is slightly larger than the median length of the BM intervals (1.15) although thecoverage is 0.96 just below the coverage of the BM intervals (0.97). Similarly, with n = 6studies the median lengths of the HC-BM and BM intervals are 0.76 and 0.67, respectively.In scenarios where the coverages of the HC and HC-BM intervals are close, the medianlengths of the intervals are similar again. In the scenarios without publication bias, wherethe coverages of the DL and IVH intervals are similar, the IVH intervals tend to be longerthan the DL intervals when heterogeneity is present (i.e. τ > Crins et al [21] report a systematic review and meta-analysis evaluating Interleukin-2receptor antibodies (IL-2RA) for immunosuppression in children who underwent livertransplantation. The authors identiﬁed a total of six controlled studies including tworandomized trials. Given the heterogeneity in the designs of the studies, some between-study heterogeneity in the treatment eﬀects can be expected. Although Crins et al [21]did not identify any publication bias by visual inspection of funnel plots and formal testsfor asymmetry of these plots, this provides little reassurance that indeed no publicationbias is present, since the number of studies is fairly small, which hinders the identiﬁcationof publication bias in funnel plots or formal hypothesis tests. Therefore, there is a needfor methods for random-eﬀects meta-analyses robust to publication bias in this setting.The endpoint acute rejections was reported in all six studies identiﬁed in the systematicreview, whereas only three also reported the outcome steroid-resistant rejections. Table3 summarizes the ﬁndings for both outcomes. These data were previously considered by8 = . τ = . τ = . No publication bias n Coverage prob. n Coverage prob. n Coverage prob.

Moderate publication bias n Coverage prob. n Coverage prob. n Coverage prob.

Severe publication bias n Coverage prob. n Coverage prob. n Coverage prob. F i g u r e : C o v e r ag e p r o b a b ili t i e s o f t h e v a r i o u s c o nﬁd e n ce i n t e r v a l s ( c i r c l e : H C , c r o ss : D L , d o t : H C - B M , p l u s : B M , t r i a n g l e : I VH ) d e p e nd i n go n t h e nu m b e r o f s t ud i e s n i n c l ud e d i n t h e m e t a - a n a l y s i s f o r n o , m o d e r a t e a nd s e v e r e pub li c a t i o nb i a s a nd f o r d i ﬀ e r e n t d e g r ee s o f b e t w ee n - t r i a l h e t e r og e n e i t y τ = . , . , . . riede et al [4] who applied several point estimators and conﬁdence intervals of the overalleﬀect including DL and BM to these. For acute rejections, DL and BM yielded log oddsratios (95% conﬁdence interval) of -1.59 (-2.21, -0.96) and -1.61 (-2.35, -0.87), respectively.The between-study heterogeneity was estimated as ˆ τ DL = 0 .

16 and ˆ τ BM = 0 .

38 with theDL and BM methods, respectively. The ﬁxed-eﬀect estimate of the overall eﬀect is -1.56smaller than the random-eﬀects estimates. The HC interval given by (-2.24, -0.89) iscentred around the ﬁxed-eﬀect estimate. The HC-BM interval proposed here is calculatedas (-2.31, -0.82) which is considerably wider than the HC interval.For steroid-resistant rejections, DL and BM resulted in a log odds ratio (95% conﬁ-dence interval) of -1.21 (-2.28, -0.15) and -1.32 (-2.78, 0.14), respectively. Whereas theDL method results in a statistically signiﬁcant treatment diﬀerence, the eﬀect is not sta-tistically signiﬁcant with the BM approach although the point estimate hints at a morepronounced treatment eﬀect. This is explained by the larger between-study heterogeneityof ˆ τ BM = 0 .

87 with the BM method which compares to ˆ τ DL = 0 .

14 with the DL method.These compare to the ﬁxed-eﬀect estimate of -1.17 with 95% conﬁdence intervals of (-2.24,-0.09) and (-2.53, 0.20) for the HC and HC-BM methods, respectively. Again, the ﬁxed-eﬀect estimate is smaller than the eﬀects obtained from random-eﬀects meta-analyses.Furthermore, the HC-BM conﬁdence interval is wider than the HC interval. Here, thiswider interval means that the eﬀect is no longer statistically signiﬁcant on the usual 5%level.

Meta-analyses of only a few studies are very common, but pose a number of challenges.These include the estimation of between-trial heterogeneity as well as the assessment ofpublication bias. Here we proposed a method that faces both challenges successfully. Theconﬁdence interval of the overall eﬀect proposed by Henmi and Copas [22] was improvedby replacing the DerSimonian-Laird estimator by the Bayes Modal estimator of Chung etal [3] in the computation of the quantiles to construct the conﬁdence interval. The useof a weakly informative prior biases the Bayes Modal estimator away from zero. Thisresulted in larger quantiles, in particular in situations with few studies and only smallto moderate levels of between-trial heterogeneity, which improved the coverage of theconﬁdence intervals.There are a number of limitations. We focused on properties related to estimatingthe overall eﬀect and did not consider other parameters such as the heterogeneity τ [31].Furthermore, we refrained form investigating other selection functions, since Henmi andCopas state that their “experience of working with other such models suggests that theextent of bias depends much more on the choice of selection parameters [. . . ] than it doeson the particular mathematical form of the selection function itself” [22]. Also, we did notinclude other comparators such as the Knapp-Hartung-Sidik-Jonkman approach [6, 7, 8],since extensive comparisons were included in the paper by Henmi and Copas [22] and alsoin more recent simulation studies [4, 11].The normal-normal hierarchical model considered here is a standard model for random-eﬀects meta-analyses. This model is very general but not without limitations since eﬀectestimates are modelled and not the data directly implying a two-step procedure. Forinstance, considering binary outcomes and treatment eﬀects summarized by odds ratiosJackson et al [28] discuss six alternative generalised linear mixed models which are moreeﬃcient one-step procedures. Modelling the data directly can have particular beneﬁts10hen dealing with rare events; see for example G¨unhan et al [29] or Gronsbell et al [30].The approach taken here to improve the coverage of conﬁdence intervals of the overalleﬀect in pairwise meta-analysis might also be useful in more complex settings such asmeta-regression or network meta-analysis. The exploration of such opportunities is out ofthe scope of this manuscript but subject of future research. Highlights

What is already known • Estimated overall eﬀects from meta-analyses might be impacted by reporting bias • A conﬁdence interval for the overall eﬀect has been proposed that is to some extentrobust to the selection of studiesWhat is new • The performance of the robust conﬁdence interval previously proposed is assessed inmeta-analyses with few studies and found not to work well in this setting • The approach is reﬁned resulting in improved coverage probabilities of the conﬁdenceintervals in particular in meta-analyses with few studiesPotential impact for RSM readers outside the authors ﬁeld • The reﬁned approach is recommend for application in meta-analyses with few studiesyielding more reliable results

Data availability statement

The data used in Section 4 are provided in Table 3. Furthermore, they are given in thepaper by Crins et al [21] and are also included in the R package bayesmeta available fromCRAN.

Acknowledgements

The authors are grateful to Professor John Copas (Warwick) for discussions during hisvisit to Tokyo and Osaka in spring 2019.

ORCID

Satoshi Hattori 0000-0001-5446-2305Tim Friede 0000-0001-5347-7441

References [1] Veroniki AA, Jackson D, Viechtbauer W, Bender R, Bowden J, Knapp G, Kuß O,Higgins JPT, Langan D, Salanti G. Methods to estimate the between-study varianceand its uncertainty in meta-analysis.

Research Synthesis Methods

Controlled Clinical Trials

Statistics in Medicine

Research Synthesis Methods

Biometrics

Statistics in Medicine

Statistics inMedicine

BMC Medical ResearchMethodology

BMC Medical Research Methodology

Biometrical Journal

Research Synthesis Methods

PsychologicalBulletin .[15] Jin ZC, Zhou XH, He J. Statistical methods for dealing with publication bias inmeta?analysis.

Statistics in Medicine

BMJ

Journal of the American Statistical Association

Biostatistics

Biometrics

Pediatric Transplantation

Statistics in Medicine

Contemporary Clinical Trials

Research Synthesis Methods

Bayesian Approaches to Clinical Trials andHealth-Care Evaluation . Chichester: Wiley; 2004.[26] Gelman A. Prior distributions for variance parameters in hierarchical mod-els(Comment on Article by Browne and Draper).

Bayesian Analysis

Statistics in Medicine

Statistics inMedicine

Research Synthesis Methods

Statistics in Medicine

Statistics in Medicine τ as well as numbers of studies n included in the meta-analysis.publication bias τ nn