[PDF] On ratio measures of population heterogeneity for meta-analyses

Abstract

Popular measures of meta-analysis heterogeneity, such as I 2 , cannot be considered measures of population heterogeneity since they are dependant on samples sizes within studies. The coefficient of variation (CV) recently introduced and defined to be the heterogeneity variance divided by the absolute value of the overall mean effect does not suffer such shortcomings. However, very large CV values can occur when the effect is small making interpretation difficult. The purpose of this paper is two-fold. Firstly, we consider variants of the CV that exist in the interval (0, 1] making interpretation simpler. Secondly, we provide interval estimators for the CV and its variants with excellent coverage properties. We perform simulation studies based on simulated and real data sets and draw comparisons between the methods.

Full PDF

OOn ratio measures of population heterogeneity formeta-analyses

Maxwell Cairns , Luke Prendergast La Trobe UniversityOctober 6, 2020

Abstract

Popular measures of meta-analysis heterogeneity, such as I , cannot be consideredmeasures of population heterogeneity since they are dependant on samples sizes withinstudies. The coeﬃcient of variation (CV) recently introduced and deﬁned to be thestandard deviation of the random eﬀect divided by the absolute value of the overall meaneﬀect does not suﬀer such shortcomings. However, very large CV values can occur whenthe eﬀect is small making interpretation diﬃcult. The purpose of this paper is two-fold. Firstly, we consider variants of the CV that exist in the interval (0 ,

1] makinginterpretation simpler. Secondly, we provide interval estimators for the CV and its variantswith excellent coverage properties. We perform simulation studies based on simulatedand real data sets and draw comparisons between the methods. Based on our simulationand examples, we recommend transforming the CV onto this (0 ,

1] domain for easein interpretation and supported by a conﬁdence interval estimate for this transformedvariable.

Maxwell Cairns , Department of Mathematics and Statistics, La Trobe University, Melbourne,Australia;Luke Prendergast , Department of Mathematics and Statistics, La Trobe University, Mel-bourne, Australia.Correspondence: Maxwell Cairns, Department of Mathematics and Statistics, La TrobeUniversity, Melbourne, Australia 3086. Email: [email protected] Prendergast, Department of Mathematics and Statistics, La Trobe University, Melbourne,Australia 3086. Email: [email protected] a r X i v : . [ s t a t . M E ] O c t Introduction

The presence of heterogeneity in a meta-analysis indicates that the true eﬀects vary betweenstudies. Also known as between-studies variance, this is a crucial part of any meta-analysis(Thompson, 1994) since it is often plausible to assume that diﬀerences in studies (e.g. genderbalance, average age etc.) lead to diﬀerences in the magnitude of the true eﬀects. Whenheterogeneity is not assumed, that is, all eﬀects are identical, a ﬁxed-eﬀect model (FEM) canbe used. Let Y i be the estimator of the eﬀect for the i th study, β be the true eﬀect to beestimated and (cid:15) i ∼ N (0 , v i ) be the random sampling error where v i = Var( Y i ) is the randomsampling error variance. Then, when there are K studies under review for the meta-analysis,the FEM is of the form Y i = β + (cid:15) i , ( i = 1 , . . . , K ) (1)where the Y i s are all estimators of the same eﬀect, β .If a common eﬀect for every study such as in the FEM is not plausible, then one optionis to use the random-eﬀects model (REM) which allows for variation between the individualstudy eﬀects. This model has the form Y i = ( β + γ i ) + (cid:15) i , ( i = 1 , . . . , K ) (2)where γ i ∼ N(0 , τ ) is the random eﬀect and the estimated eﬀect for the i th study is ( β + γ i ).When using an REM meta-analysis, understanding the extent of heterogeneity is important.For example, even when β is large, if τ is also large meaning that the true eﬀects vary greatly,then it is possible that not all eﬀects are clinically signiﬁcant from a level indicating zero eﬀect.The structure of this paper is as follows. Firstly we ﬁnish this introduction with a briefreview of some popular measures of heterogeneity before providing a motivating example. InSection 2 we discuss and compare several similar measures of heterogeneity. In Section 3 weprovide the variance and bias of our variables. In Section 4 we introduce several methods forcalculating conﬁdence intervals for the three measures we discuss and review their performancevia simulations in Section 5. In Section 6 we provide examples on how the three measures couldbe used in real meta-analyses, including returning to our motivating example before makingsome concluding comments. Let W i = 1 /v i be the inverse variance weight (IVW) for the i th study. The IVW estimator ofthe eﬀect in FEM meta-analysis given as (cid:98) β = K (cid:88) k = i w i Y i where w i = W i / (cid:80) Ki =1 W i are the weights scaled to sum to one. The IVW weights minimisethe variance of (cid:98) β . For a REM meta-analysis, the weights are 1 / ( v i + τ ) which again minimisethe variance of the estimator although in practice one must replace the unknown τ with itsestimate.For simplicity now and later, we use notation from Biggerstaﬀ & Tweedie (1997) and let S r = (cid:80) Ki =1 W ri . Cochrane’s Q (Cochran, 1954), given as Q = K (cid:88) i =1 W i Y i − S − K (cid:88) i = i ( W i Y i ) ,

2s often used to facilitate tests for heterogeneity. Q also arises in the popular DerSimonian &Laird (1986) estimator of τ deﬁned as T = max (cid:26) , Q − ( K − S − S /S (cid:27) , (3)where truncation is used at zero to avoid negative estimates of the random eﬀect variance.The most common statistic used to report levels of heterogeneity is I , which is the propor-tion of heterogeneity variance relative to the total variation. It is given as I = τ τ + σ Y , (4)where σ Y is chosen to be a typical within-study variance. Proposed by Higgins & Thompson(2002), when the DerSimonian & Laird (1986) estimator of τ is used and and under theirrecommended choice for σ Y , a common estimate of I is simply I = Q − dfQ , where df = K − et al. (2011), I is a descriptivestatistic for inconsistency between the ﬁndings between studies, and not a measure of how muchvariation there is between the study eﬀects. Hence, given that it is the proportion of varianceexplained by heterogeneity relative to the sum of the heterogeneity variance and within-studyvariance, a large estimated I should not in itself lead to the conclusion that there exists alarge amount of heterogeneity. For example, if the within-study sample sizes are large, then thewithin-study variances can be very small leading to large I , even when τ is small. Hence, if alarge I is used to highlight a potentially unreliable meta-analysis, then this needs to be doneso by also considering the size of the τ estimate. Misinterpretation of I has received recentattention (e.g. R¨ucker et al. , 2008; Hoaglin, 2016; Kulinskaya & Dollinger, 2016; Borenstein et al. , 2017). We will now provide a motivating example based on a meta-analysis published in the

Journalof Medical Virology by Zhu et al. (2020). The meta-analysis was performed on 35 studies usingan REM and a double arcsin transformation of incidence rates. We have replicated the analysisusing the metafor (Viechtbauer, 2010) package in R (R Core Team, 2020). In their analysis,heterogeneity was assessed using the Q statistic and I .From the forest plot in Figure 1, we can see that I is very large (95%) and that the Q testindicates signiﬁcant heterogeneity. However, one should really be asking is the heterogeneitylarge in the context of what is being measured? One way to consider this is to consider the sizeof the estimated τ ( T ) relative to the estimated eﬀect where we use T instead of T since T and the estimated eﬀect are measured on the same scale. We have √ T = √ . ≈ .

5, whichis not large relative to the estimated eﬀect size of 2.22. In this context it is convenient then touse a measure of heterogeneity such as the coeﬃcient of variation, denoted CV B , introducedby (Takkouche et al. , 1999, 2013), deﬁned to be the ratio of T to the absolute value of theestimated eﬀect. The estimated CV B is 0.227 indicating towards small heterogeneity, which isin contrast to how I is often interpreted (e.g. Hoaglin, 2016).3 E Model 0.5 1 1.5 2 2.5 3 3.5Freeman−Tukey Transformed Incidence RateStudy 35Study 34Study 33Study 32Study 31Study 30Study 29Study 28Study 27Study 26Study 25Study 24Study 23Study 22Study 21Study 20Study 19Study 18Study 17Study 16Study 15Study 14Study 13Study 12Study 11Study 10Study 9Study 8Study 7Study 6Study 5Study 4Study 3Study 2Study 1 2.12 [1.91, 2.33]2.49 [2.22, 2.76]1.25 [0.76, 1.74]1.89 [1.47, 2.31]1.79 [1.61, 1.97]2.15 [1.56, 2.74]2.24 [1.70, 2.78]2.86 [2.32, 3.40]2.26 [1.96, 2.56]2.34 [2.01, 2.67]1.90 [1.61, 2.19]1.75 [1.59, 1.91]2.41 [2.21, 2.61]2.12 [1.75, 2.49]2.30 [2.02, 2.58]2.33 [2.11, 2.55]2.11 [1.81, 2.41]2.86 [2.32, 3.40]1.92 [1.61, 2.23]2.14 [1.89, 2.39]1.43 [1.37, 1.49]1.97 [1.51, 2.43]2.04 [1.69, 2.39]2.06 [1.79, 2.33]2.97 [2.63, 3.31]2.14 [1.82, 2.46]2.31 [2.02, 2.60]2.70 [2.35, 3.05]2.28 [2.01, 2.55]2.11 [1.75, 2.47]2.54 [2.38, 2.70]2.87 [2.70, 3.04]2.25 [2.08, 2.42]2.77 [2.47, 3.07]2.28 [2.09, 2.47]2.22 [2.05, 2.40]

Heterogeneity: I = 95.0% [93.88%, 95.94%] T = 0.255 [0.205, 0.315] Q = 681.88, df = 34.000, p = .000 Figure 1: A replicated forest plot of Figure 3 in Zhu et al. (2020) which contains summaryinformation of a meta-analysis containing 35 studies and heterogeneity statistics. Note thatthere are some minor discrepancies in the conﬁdence intervals due to rounding.4

Some ratio measures of heterogeneity

We have previously discussed the use of I and that it is often misinterpreted as a populationmeasure of heterogeneity. However, it is widely used because as a ratio measure it is independentof scale. Below we list some properties we think are useful in the context of broader appealfor the use of measures of heterogeneity. We consider measures to be the true values to beestimated, although for the properties below note that there estimator equivalents. P1.

The measure is on an easily interpretable scale (e.g., [0,1]) or in the units of measurementof the eﬀect.

P2.

The measure can be interpreted as a population-level measure of heterogeneity.

P3.

The measure can be applied to all random eﬀects meta-analyses.

P4.

Reliable interval estimators of the measure are available.Unfortunately, popular measures for reporting heterogeneity do not have all of these prop-erties. Most satisfy P1, but do not exhibit P2. Below we consider some other measures.

The Diamond Ratio (DR, Cumming & Calin-Jageman, 2016), so called because it is the ratioof the length of the REM diamond depicting the conﬁdence interval for the mean eﬀect to thelength of FEM diamond, is one ratio measure of heterogeneity. Deﬁned as DR = √ V RE √ V F E , (5)where V RE and V F E are the estimated variances of the meta-estimator from the REM and FEManalyses. Originally proposed by Higgins & Thompson (2002), while like I it is dependenton within-study sample sizes, it is less likely to be confused with a population measure ofheterogeneity. Instead, it can be viewed as a measure of trade-oﬀ in moving from the FEManalysis to the REM analysis. This method allows the researcher a visual interpretation viaforest plots, showing the percentage increase in the width of the REM conﬁdence intervalcompared to the FEM conﬁdence interval. Conﬁdence intervals with very good coverage wererecently proposed by Cairns et al. (2020). One of the measures proposed by Takkouche et al. (1999) for measuring heterogeneity was thebetween-study coeﬃcient of variation given asCV B = τ | β | . (6)In their paper the FEM estimator of β was used in estimating CV B , though this was changedin preference for the REM estimator by Takkouche et al. (2013) who also considered severalconﬁdence intervals for CV B , namely Wald-type conﬁdence intervals (i.e. estimate ± . × SEwhere SE is the standard error) and bootstrap intervals. An advantage of the CV B measureis that it does not depend on within-study sample sizes and can therefore be considered apopulation measure of heterogeneity. A potential drawback, is that very large values can beproduced when β is near zero. Hence, care needs to be taken when using measures with commoneﬀects such as standardized mean diﬀerence where an eﬀect close to zero is not uncommon.5 .3 Ratio of the heterogeneity variance to the eﬀect estimator vari-ance Crippa et al. (2016) introduced R b = 1 K K (cid:88) i =1 τ v i + τ , (7)which arises from R b = τ / [ K Var( (cid:98) β )] where (cid:98) β is the REM eﬀect estimator. Clearly R b ∈ [0 , R b = 1 occurs when the within-study variances (variances of the study eﬀect estimators)are zero, indicating maximum heterogeneity relative to within-study sampling variances. Like I and the DR, R b depends on the within-study variances and is therefore not a populationmeasure of heterogeneity. However, it was intended as a measure of heterogeneity betweenstudies and should be interpreted as such. Another possibility is to adjust CV B so that it is valid when | β | = 0, easier to interpret when | β | is small, and still a measure of heterogeneity relative to the eﬀect size. As two possibilitieswe propose M = ττ + | β | and M = τ τ + β . (8)An advantage of these measures is that they are bounded in [0 ,

1] and therefore are simpleto interpret. In both cases, M i = 1 results when τ > β = 0 so that heterogeneity varianceis maximised relative to mean eﬀect. We note the link between M , M and CV B where, byusing the fact that M − = 1 + 1 / CV B and M − = 1 + 1 / CV B , we can write M = CV B B and M = CV B B . (9)There are further links between CV B , M and M . For example, consider the logit trans-formation deﬁned to be, for a u ∈ (0 , u ) = log[ u/ (1 − u )] then we have thatlogit( M ) = log(CV B ) and logit( M ) = 2 log(CV B ) . (10)Note that Takkouche et al. (2013) produces conﬁdence intervals for the CV B measure, andone of those is for the log transformed CV B . Hence, one possibility is to transform the intervalfor the log(CV B ) using the inverse logit transformation to obtain conﬁdence intervals for M and M . Basic comparisons of the measures are of interest. We looked to see how many of the desiredproperties each measure satisﬁes (refer to Section 2 for deﬁnitions). In Table 1 we summarisethe measures considered in regards to satisfying properties P1-P4. As to whether the measuresare all applicable for every meta-analysis, work is needed in the context of meta-regressionand we comment on this below in Remark 1. Takkouche et al. (2013) proposed bootstrap andWald-type intervals for the R b and CV B measures with the Wald intervals being the betterperformed overall. However, while good coverage was achieved for many settings, in others thecoverage was not close to nominal and often this was due to being too over-conservative. Weverify this later with our own simulations using the Wald intervals. However, we also propose6ntervals for CV B , M and M that do exhibit excellent coverages across our simulation studies.For R b , since it is a function of τ and the ﬁxed within-study variances, good coverages shouldbe possible using the a substitution approach with the τ conﬁdence intervals. Cairns et al. (2020) achieved good coverages using this approach for the DR which also depends on τ andwithin-study variances.Table 1: Properties P1-P4 satisﬁed (Y = Yes, N = No, ? = see Remark 1), for I ,CV B , DR, R b , M and M .Measure P1 P2 P3 P4 Comments I Y N Y YCV B N Y ? Y ∗ See Remark 1 for P3 in regards to meta-regressionDR Y N ? Y See Remark 1 for P3 in regards to meta-regression R b Y N Y ? Intervals available and others possible M Y Y ? Y ∗ See Remark 1 for P3 in regards to meta-regression M Y Y ? Y ∗ See Remark 1 for P3 in regards to meta-regression ∗ Conﬁdence intervals with excellent coverage for CV B , M and M are proposed and assessedlater. Remark 1.

While I and R b are naturally deﬁned in a meta-regression analysis, this is nottrue for the CV B , DR and M and M since they vary depending on the level chosen forthe moderator. Possibilities exists, and this is part of some ongoing work, when a moderatoris categorical in that we have a heterogeneity measures for each level of the moderator, andeven for numeric moderators where the heterogeneity measure may make sense given a suitablereference points (e.g. for the moderator set to zero or the mean of all moderator values). Thisongoing work includes the construction of conﬁdence intervals. Since they are measures of population heterogeneity, we will focus out attention throughoutthe rest of this paper on the CV B , M and M measures, but in the below compare them tothe most commonly reported heterogeneity measure, I .Table 2: Summary statistics for 1000 simulated values for I , CV B , M and M for varying β and τ . The minimum (Min), ﬁrst quartile ( Q ), median ( m ), third quartile ( Q ) and maximum(Max) are reported. τ = 0 τ = 0 . τ = 0 . β Meas. Min Q m Q Max Min Q m Q Max Min Q m Q Max0 . I B M M . I B M M . I B M M We generated 1000 data sets consisting of 10 studies which two arms, each containing 10observations. We randomly sampled standardised mean diﬀerences (SMDs), see further details7igure 2: Boxplots of 1000 simulated values of I , M and M arising from randomly generatedSMDs for varying τ (horizontal axis of each plot) and varying β (over rows of plots).later, and conducted a meta-analysis on the generated data using the rma function from theR metafor package Viechtbauer (2010). We used the DerSimonian and Laird estimator of τ (DL, DerSimonian & Laird, 1986) since it most common across various packages and usuallyused to calculate I . We varied the true eﬀect over 0 . , . . τ , 0 (zero heterogeneity), 0 . τ increases, so too does the median measure and theminimum of zero even when τ = 0 . B is sometimes very large which occurs due to small estimates of β .When this happens M and M can be very close to one. We also note that I does not varymuch at all with varying β , which is expected given that it compares the heterogeneity variancewith within-study variance. Hence, the measure gives similar results when τ = 0 . β .Figure 2 contains boxplots comparing the spread of I , M and M for the choices of β inTable 2 with some additional choices for τ . We do not include CV B due to the sometimes largescale. As expected and noted before, we see that there is not much change in the location and8pread of I as the value of β increases. However, for M and M values become smaller as β increases indicating smaller relative heterogeneity in the context of eﬀect size. As a ratio bounded on (0 , M and M so that the transformed variables are on the domain ( −∞ , ∞ ).We choose the logit transformation given as logit( u/ (1 − u )) for u ∈ (0 ,

1) where back-transformation to the original scale can be achieved using its inverse, logit − ( v ) = exp( v ) / (1 +exp( v )). There are two notable advantages to this approach; ﬁrstly the bounds of interval esti-mators that are ﬁrstly computed on the logit scale and then back-transformed, are also boundedon [0 , B , M and M in that log(CV B ) = logit( M ) = logit( M ) / Theorem 1.

An approximate variance and bias for the logit transformed M , logit ( (cid:99) M ) , esti-mator are Var (cid:104) logit ( (cid:99) M ) (cid:105) ≈ Var ( (cid:98) τ ) · τ + Var ( (cid:98) β ) · β , bias [ logit ( (cid:99) M )] ≈ (cid:20) Var ( (cid:98) β ) · β − Var ( (cid:98) τ ) · τ (cid:21) where bias ( · · · ) denotes the bias of its estimator argument. Takkouche et al. (2013) computed an approximate variance for the CV B estimator also usingthe delta method and asymptotic independence between the β and τ estimators. In noting thatlog( (cid:99) CV B ) = logit( (cid:99) M ) as above, another way to also arrive at the variance in Theorem 1 is byusing the variance for (cid:100) CV B and applying the delta method to the log of the CV B estimator.To estimate the variance and bias for logit( (cid:99) M ), τ and β may be replaced with respectiveestimates. Variance estimates for the β and τ estimators are available; e.g. the varianceestimate for the DL estimator of τ can be computed from the REM and FEM weights and the τ estimate (e.g. Biggerstaﬀ & Tweedie, 1997) and variances for more complicated estimators of τ are attainable through packages such as metafor (Viechtbauer, 2010). The variance estimatefor the β estimator is simply the inverse of the sum of unscaled weights (the inverse of thewithin-study variances) given as (cid:100) Var( (cid:98) β ) = 1 (cid:80) Ki =1 W ∗ i where W ∗ i = 1 / ( v i + (cid:98) τ ). Note that the above variance is an estimate since (cid:98) τ replaces theunknown τ . Corollary 1.

An approximate variance and bias for logit( (cid:99) M ) areVar [ logit ( (cid:99) M )] = 4 × Var [ logit ( (cid:99) M )] bias [ logit ( (cid:99) M )] = 2 × bias [ logit ( (cid:99) M )] . We now consider some examples of the variance and bias of the M and M estimators. Example 1: variance based on the DL estimator W i = 1 /v i is the inverse variance weight for the i th study and is assumed ﬁxed. Forsimplicity we ignore truncation for the DL estimator in (3). Under this assumption, Biggerstaﬀ& Tweedie (1997) give the variance of the DL estimator of τ asVar( T ) = Var( Q )( S − S /S ) where Var( Q ) = 2( K −

1) + c τ + c τ for some constants c and c that depend on S , S and S . From Theorem 1, unless c = 4( S − S /S ) is close to zero, we can see that the ﬁrst term invariance logit( (cid:99) M ) decreases with increasing τ . However, since the variance of (cid:98) β increases withincreasing τ as to whether the variance of logit( (cid:99) M ) increases or decrease with τ depends onthe magnitude of β . If β is large, then the variance will decrease for some increasing τ beforestarting to increase as τ becomes large relative to β .This example becomes simpler to illustrate if we assume that the within-study variancesare small compared to τ (i.e. each v i << τ ). Under this assumption, and the variance for Q from, e.g., Biggerstaﬀ & Tweedie (1997) (who reference Larholt et al. , 1994), we have thatVar[logit( (cid:99) M )] ≈ (cid:18) S − S S + S S (cid:19) + 1 K · τ β . (11)Note that the ﬁrst term in (11) is ﬁxed, depending on the within-study variances. Thesecond term is proportional to CV B , indicating that the variance of the estimator increases ascoeﬃcient of variation increases. Example 2: bias based on the DL estimator

From the deﬁnitions seen in the previous example and Theorem 1, we can see that the bias ofof logit( (cid:99) M ) increases with τ . As τ becomes large, the bias of the estimator depends on themagnitude of β compared to τ and we see similar behaviour to the example above. However,if the magnitude of β is large compared to τ (i.e. a small CV B ), the bias of the estimator canalso be large and negative.In the case where v i << τ , we see thatbias[logit( (cid:99) M )] ≈ (cid:20) K τ β − (cid:18) S − S S + S S (cid:19)(cid:21) . (12)Thus, as in the previous example, in the case where the within studies variances are smallcompared to τ , the bias of the estimator increases with the coeﬃcient of variation. We now propose several potential interval estimators for CV B , M and M . Some of theseconﬁdence intervals require a conﬁdence intervals for τ , which depend on the estimator chosenfor τ and the interval estimator. In this paper we calculate the intervals for τ using the Q-proﬁling method (Viechtbauer, 2007). However, it is important to note that there are manyways of calculating these conﬁdence intervals and for details see, e.g., Viechtbauer (2007). Thispaper also notes that coverages for bootstrap methods can vary wildly away from the nominalcoverage. Thus, we will not consider bootstrap intervals in this paper. As noted in Section 3, since M and M are bounded on [0 ,

1] we consider intervals for the logittransformed statistics before back-transforming to the original scale. Hence, our Wald interval10or the logit of M i ( i = 1 ,

2) is[ L i , U i ] = logit( (cid:99) M i ) ± z − α/ (cid:113)(cid:100) Var[logit( (cid:99) M i )] (13)where (cid:100) Var[logit( (cid:99) M )] and (cid:100) Var[logit( (cid:99) M )] can be found in Theorem 1 and Corrolary 1 respec-tively.This interval can then be back-transformed to the M i scale noting the transformationexp( ρ i ) / [1 + exp( ρ i )] = ρ i . These logit transformed intervals often perform better than theregular Wald-type intervals.Wald-type intervals for the CV B and log-transformed CV B measure were considered inTakkouche et al. (2013). Given the link between CV B , M and M given aslog(CV B ) = logit( M ) = logit( M ) / , a Wald interval need only be found for one of the above, with the suitable back-transformationto return to the original scale for all three. In what follows let [ L τ ( α ) , U τ ( α )] and [ L β ( α ) , U β ( α )] denote (1 − α ) × τ and β respectively. We also require intervals for | β | and β which requires special attentionsince, the signs of the lower and upper bounds diﬀer for the interval for β , using the absoluteor square transformation on the interval bounds does not result in an interval that containszero, and will therefore have coverage lower than nominal. We take a conservative approach tocalculating the intervals that is based on transformation of the lower and upper bounds for the β interval.As an example, we consider constructing an interval for | β | . Based on the signs of theintervals for β , there are three scenarios:1. If L β ( α ) > U β ( α ) > L | β | ( α ) = | L β ( α ) | and U | β | ( α ) = | U β ( α ) | ,2. If L β ( α ) < U β ( α ) < L | β | ( α ) = | U β ( α ) | and U | β | ( α ) = | L β ( α ) | ,3. If L β ( α ) < U β ( α ) > L | β | ( α ) = 0 and U | β | ( α ) = max ( | L β ( α ) | , | U β ( α ) | ).The conservativeness of this approach is associated with the third case above, where wechoose the maximum of the transformed bounds as the upper bound. An interval for β canbe similarly obtained by squaring the bounds of the interval for β .In what follows, we discuss how to create some simple conﬁdence intervals for M and notethat the generalisation to CV B and M is similar in concept and therefore straightforward.Throughout we let (cid:98) τ and (cid:98) β denote estimate of τ and β where, like with the interval estimators,several diﬀerent estimators are possible. For simplicity, let CI CV (1 − α τ , − α β ), CI M i (1 − α τ , − α β ) ( i = 1 ,

2) denote intervals for CV B , M and M where the interval for τ is a (1 − α τ ) × β it is (1 − α β ) × When the conﬁdence interval for one of the parameters is very narrow, then it may be possibleto achieve good coverage by ﬁxing that parameter, and simply using the lower and upper boundsof the other. This approach has been used before; e.g. Newcombe (2011) notes that Khan &Chien (1997) used this approach to derive an interval for a two parameter function where theparameters were independently estimated. In regards to the CV B measure of heterogeneity,11akkouche et al. (2013), also used such an approach by ﬁxing (cid:107) (cid:98) β (cid:107) and using conﬁdence intervalsfor τ . Using the notation above, ﬁxing ˆ β is a CI . (0 . ,

0) interval and similarly CI . (0 , .

95) whenﬁxing (cid:98) τ .Choosing the placement of the lower and upper bounds depends on the variable, and whetherthe function deﬁning the measure used is a decreasing or increasing function of the variableused. For example, let M ( a, b ) = a/ ( a + b ) with both a > b >

0. Then ∂M ( a, b ) / ( ∂a ) = b/ ( a + b ) ≥ M , is an increasing function of τ . Similarly, and moreobviously, M is decreasing in (cid:107) β (cid:107) . Then two possible intervals for M are thenCI M (0 , .

95) = (cid:20) (cid:98) τ (cid:98) τ + U | β | ( α ) , (cid:98) τ (cid:98) τ + L | β | ( α ) (cid:21) , CI M ( . ,

0) = (cid:34) L τ ( α ) L τ ( α ) + (cid:12)(cid:12) (cid:98) β (cid:12)(cid:12) , U τ ( α ) U τ ( α ) + (cid:12)(cid:12) (cid:98) β (cid:12)(cid:12) (cid:35) . (14)Note that, and perhaps it is clearer, we can arrive at the same interval by writing1 /M = 1 + | β | /τ. Hence, using the idea above, a conﬁdence interval for 1 /M while ﬁxing | (cid:98) β | is (cid:34) | (cid:98) β | U τ ( α ) , | (cid:98) β | L τ ( α ) (cid:35) . (15)A conﬁdence interval for M can then be found by inverting and switching the bounds of theinterval for 1 /M and we arrive back at (14). Another option to the above is to use the intervals simultaneously for both parameters. Asnoted by Newcombe (2011), Lloyd (1990) used this approach for an interval for the diﬀerence incorrelated proportions which are typically conservative and preferable to those that too liberal.Then a conﬁdence interval using the inversion approach above for 1 /M while ﬁxing | (cid:98) β | isCI M (0 . , . (cid:20) L | β | ( α ) U τ ( α ) , U | β | ( α ) L τ ( α ) (cid:21) . (16)A conﬁdence interval for M can then be found by inverting and switching the bounds of theinterval for 1 /M . As noted in the previous subsection, rearranging gives the conﬁdence intervalfor M as (cid:20) L τ ( α ) L τ ( α ) + U | β | ( α ) , U τ ( α ) U τ ( α ) + L | β | ( α ) (cid:21) . (17) α -adjusted intervals) To adjust for the conservative nature of the intervals above, another option is to decrease thecoverage of the two intervals (Tryon, 2001; Tryon & Lewis, 2009; Goldstein & Healy, 1995).For a 95% conﬁdence interval, this means changing α from 0.05 to 0.1658. This method thencalculates intervals based similar to the above with these adjusted value of α . These aretherefore CI · (0 . , . Adjusting the α levels for the intervals used in the combined interval can work well in reducingthe conservativeness associated with two 95% conﬁdence intervals. However, adjusting the α

12o 0.1658 when, for e.g., one of the parameters is estimated precisely and one is highly variable,then increasing the α by this magnitude for the latter may result in liberal intervals.Given the above, Propagating Imprecision (PropImp Newcombe, 2011) is an iterative ap-proach of calculating conﬁdence intervals where the α levels are adjusted diﬀerently, underconstraints, for each of the lower and upper bounds, and to levels that maximise the intervalwidth. For a two parameter function for which a conﬁdence interval is needed, PropImp canchose four diﬀerent values for α , one for each of the bounds for the intervals for each parameter.Deﬁne f ( X, Y ) to be monotonic in both X and Y , where, ( X Lz ( α ) , X Uz ( α ) ) is a (1 − α ) × X and z ( α ) = Φ − (1 − α/

2) where Φ − is the standard normal inversedistribution function; e.g. Φ − (0 . ≈ .

96. The conﬁdence interval for Y , ( Y Lz ( α ) , Y Uz ( α ) ), issimilarly deﬁned.Then a conﬁdence interval for F can be formed from the conﬁdence intervals for X and Y with three cases to consider. These cases are1. f ( X, Y ) is increasing in both X and Y ,2. f ( X, Y ) is increasing in one and decreasing in the other,3. f ( X, Y ) is decreasing in both.For CV B , M and M , we deﬁne f such that f ( τ, β ) is equal to τ / | β | , τ / ( τ + | β | ) and τ / ( τ + β ) respectively. Hence, for our measures we are dealing with Case 2 from above;increasing in X , decreasing in Y . For Case 2, we have intervals of the form ( L , U ), where L = min ≤ θ ≤ π/ f ( X Lz sin( θ ) , Y Uz cos( θ ) ) , U = max ≤ θ ≤ π/ f ( X Uz sin( θ ) , Y Lz cos( θ ) ) (18)noting that sin ( θ ) + cos ( θ ) = 1. Note the constraint arises since, e.g., for a 95% conﬁdenceinterval , z = 1 .

96 and (cid:2) .

96 sin ( θ ) + 1 .

96 cos ( θ ) (cid:3) = 1 . , and similarly with θ .Consequently, and as an example, the PropImp interval for CV B is L = min ≤ θ ≤ π/ ˆ τ Lz sin( θ ) | ˆ β z cos( θ ) | U , U = max ≤ θ ≤ π/ ˆ τ Uz sin( θ ) | ˆ β z cos( θ ) | L (19)where ˆ τ Lz sin( θ ) and ˆ τ Uz sin( θ ) are the lower and upper bounds of the interval for τ , with similarlydeﬁned bounds for β . As noted in Newcombe (2011), PropImp conﬁdence intervals are generallywider then for other adjusted methods (Daly, 1998). However they tend to compare favourablywhen it comes to coverage. To better understand PropImp in comparison to other intervals,we consider the following special cases: Case 1: θ = θ = π/ . For this case we have z sin( θ ) = z cos( θ ) = z sin( θ ) = z cos( θ ) =1 .

386 and the resulting interval is the α -adjusted interval from Section 4.2.3. That isCI · (0 . , . Case 2: θ = θ = π/ . Here, since we have z sin( θ ) = z sin( θ ) = 1 .

96 and z cos( θ ) = z cos( θ ) = 0, the resulting interval simply ﬁxes (cid:98) β in the lower and upper bounds anduses the interval for τ to determine the bounds. Hence, CI · (0 . , Case 3: θ = θ = 0 . Similar to Case 2, but the resulting interval is CI · (0 , . α -adjusted interval and also the interval ﬁxing one parameter, and using the interval for theother. However, in practice other choices are likely to result.13 .2.5 MOVER-R Extending interval estimators from Donner & Zou (2012), MOVER-R (Newcombe, 2016) is amethod for calculating conﬁdence intervals for θ θ , based on independent estimators (cid:98) θ and (cid:98) θ .In exploring the use of these intervals, we found that MOVER-R was not in the context ofCV B , M and M since the lower bound for the interval for the denominator can be zero; e.g.the lower and upper bounds for the conﬁdence interval for β are of opposite sign. As noted byNewcombe (2016), the PropImp method is more general, and given that it can be used in thisscenario we do not consider MOVER-R in what follows. To assess the performance of our proposed intervals we performed simulation studies. Westart with conducting simulations focusing on the eﬀect of varying K , τ and β . We then usesettings similar to those for several real-data meta-analyses (e.g. sample sizes, K etc.). Wefound the bias-adjusting the Wald interval did not improve the performance of the intervals sothe performance of the standard Wald intervals are reported. Additionally, the combined 0.95intervals were too conservative, and the use of one interval while ﬁxing one parameter were tooliberal. We therefore have focused only on the α -adjusted, PropImp and Wald intervals. We simulated standardised mean diﬀerences with equal sample sizes of 30 in each arm andvarying the number of studies, K = 10 , . . . ,

50. We also considered three values for β , 0 .

2, 0 . . τ set to 0 . , . , . .

8. Observed eﬀects were then randomly sampled from a non-central t-distribution underassumed normal distributions in each arm (e.g. follow Proposition 2.1 of Malloy et al. , 2013,adjusted to the two arms case). We performed 10,000 trials for each possible setting. Note, dueto the direct connections between CV B , M and M , the coverages for the intervals consideredare the same and so that when reporting coverage, one result for each setting combinationdetails coverage for each of the three measures. Due to the large number of trials, we used theDerSimonian and Laird estimator of τ for computational eﬃciency.For small τ and small number of studies, it is possible that (cid:98) τ = 0 due to truncation of theestimator to avoid negative estimates. In practice, when this happens it is a pointless exerciseto construct intervals for heterogeneity and conﬁdence intervals for τ are similarly zero in bothbounds. Rather than remove these trials, we chose the largest interval, e.g. for M i (0 ,

1) toindicate uncertainty for measuring heterogeneity.Figures 3 depicts coverages for β = 0 .

2. The Wald intervals were unreliable, being possi-bly very conservative otherwise, or too liberal and with coverage decreasing as heterogeneityincreased. The α -adjusted intervals (i.e. CI . (0 . , . Simulation results for β = 0 . α -adjusted intervals. The PropImpintervals were close to nominal or slightly conservative and therefore more reliable.14 l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l

10 20 30 40 50 . . . . . . . t = l a −adj PropImp WT l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l

10 20 30 40 50 . . . . . . . t = l a −adj PropImp WT Figure 3: Line plots showing coverages for CV B , M and M measures (note coverages for eachsetting for each measure are the same) on the vertical axis and the number of studies along thehorizontal axis, for diﬀerent values of τ when β = 0 . et al. , 2004) with 48 studies. In both cases sample sizes rangedfrom small to large. Both of these data sets are available in the metafor package (Viechtbauer,2010). To simulate data from the non-central t-distributions, we split the sample sizes ap-proximately in the two arms. Simulations based on Zhu et al. (2020), which reported assumedapproximately normal transformed incidence rates, generated data from the normal distributionusing observed eﬀects plus a generated random eﬀect and within-study variances approximatedfrom the reported intervals. These simulations were based on 10,000 simulated data sets andwith β set to the estimated value from the meta-analysis on the original data, τ was variedover 0 . , . , . , . B were obtained exponentiating the Wald-type intervalfor the logit transformed M (recalling the link between the measures). A disadvantage ofthis Wald interval for CV B is that following back-transformation the upper bound could beextremely large. This results in meaningless average widths (e.g. not representing typical15 l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l

10 20 30 40 50 . . . . . . . t = l a −adj PropImp WT l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l

10 20 30 40 50 . . . . . . . t = l a −adj PropImp WT Figure 4: Line plots showing coverages for CV B , M and M measures (note coverages for eachsetting for each measure are the same) on the vertical axis and the number of studies along thehorizontal axis, for diﬀerent values of τ when β = 0 . α -adjusted method typically displayed reasonable coverage although tended to be lib-eral, and for the Zhu study setting coverage dropped to just 0.868 for small τ . Overall, thePropImp intervals, had excellent coverage, being close to the nominal 0.95 for all settings.The widths were often large for the CV B measure, in particular when β is small relative to τ . However, by reporting on the [0 ,

1] scale like for M , we avoid the problem of very largewidths and there the intervals may be easier to interpret. Let us now return to our motivating example where we considered a meta-analysis containing 35studies and using a single armed, random-eﬀects analysis of double arcsin transformed incidencerates. Due to the good coverages obtaqined in the simulations, we report the PropImp intervals.Calculating CV B and its conﬁdence interval, we have CV B = 0 .

227 and (0 . , . M = 0 .

185 (0 . , . M = 0 .

049 (0 . , . l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l

10 20 30 40 50 . . . . . . . t = l a −adj PropImp WT l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l

10 20 30 40 50 . . . . . . . t = l a −adj PropImp WT Figure 5: Line plots showing coverages for CV B , M and M measures (note coverages for eachsetting for each measure are the same) on the vertical axis and the number of studies along thehorizontal axis, for diﬀerent values of τ when β = 0 . I for the second and third data sets, we seethat the estimates, and intervals, for CV B and M are very diﬀerent with either large or smallrelative heterogeneity. In this paper we propose new intervals for the coeﬃcient of variation used to assess heterogeneityin meta-analyses. We also suggest to transform the measure onto a [0 ,

1] scale, to avoidthe possibility of very large values of the measure and associated conﬁdence intervals. Wediscuss similarities between the transformed measures and highlight that they can be used asa population estimate of heterogeneity. We suggest that of the two measures we propose, M would be preferred due to it being a simpler re-scaling of the coeﬃcient of variation. There aretwo key relationships here that are relevant beinglogit( M ) = log(CV b ) , M − = 1 + 1 / CV B meaning that it is simple to covert point and interval estimators from the M scale to the CV B scale, and vice versa. 17able 3: Settings based on three published meta-analysis to be used in the simulation studies.The estimated values for CV B , M and M are also provided. Values of the within-study vari-ances for Zhu et al. (2020) were calculated approximately from the report conﬁdence intervals. Study settings: sample size ( N ) or within-study variances ( v )Normand (1999) N = 311 , , , , , , , , (cid:100) CV B = 1 .

384 (sample sizes split approximately equally between two arms) (cid:99) M = 0 . (cid:99) M = 0 . N = 60 , , , , , , , , , , , , , , , , (cid:100) CV B = 0 .

970 24 , , , , , , , , , , , , , , , , (cid:99) M = 0 .

492 91 , , , , , , , , , , , , , , (cid:99) M = 0 .

485 (sample sizes split approximately equally between two arms)Zhu et al. (2020) v = 0 . , . , . , . , . , . , . , . , . , (cid:100) CV B = 0 .

227 0 . , . , . , . , . , . , . , . , . , (cid:99) M = 0 .

185 0 . , . , . , . , . , . , . , . , . , (cid:99) M = 0 .

049 0 . , . , . , . , . , . , . , . Several conﬁdence intervals for the coeﬃcient of variation and the transformed measureswere considered that exhibit good coverage properties. We recommend the slightly conservativePropImp intervals (Newcombe, 2011) which can be reliably used in practice. Wald type intervalshad varying performance. A simple alternative to the PropImp intervals is to combine reducedcoverage intervals (coverage of each set to 0.8342) since this interval typically had reasonablecoverage.

Supplementary Materials

All simulation code and data is available at https://osf.io/yq65h/ τ : ∗ median widths, Normand (1999), Bangert-Drowns et al. (2004), Zhu et al. (2020). settings measure coverage width (CV B ) width ( M ) width ( M ) τ = 0 . ( β = 0 . α adj ∗ ∗ .

997 1.057 ∗ ( β = 0 . α adj ∗ ∗ ∗ ( β = 2 . α adj τ = 0 . ( β = 0 . α adj ∗ ∗ .

988 1.479 ∗ ( β = 0 . α adj ∗ ∗ ∗ ( β = 2 . α adj τ = 0 . ( β = 0 . α adj ∗ ∗ ∗ ( β = 0 . α adj ∗ ∗ ∗ ( β = 2 . α adj τ = 0 . ( β = 0 . α adj ∗ ∗ ∗ ( β = 0 . α adj ∗ ∞ ∗ ∗ ( β = 2 . α adj α -adjusted and PropImp methods for CV B and M measures.Dataset Measure Estimate α adj PropImpHSSP I (cid:98) τ = 0 . B I (cid:98) τ = 0 . B I (cid:98) τ = 0 . B Approximate variance and bias for the logit trans-formed M estimator Recall that logit( (cid:99) M ) = log( (cid:99) CV B ) = log( (cid:98) τ ) − log( | (cid:98) β | ) so that logit( (cid:99) M ) = log( √ v ) − log( | (cid:98) β | )where v = (cid:98) τ . Using the Delta method, the second order Taylor Series expansion giveslogit( (cid:99) M ) ≈ logit( M ) + ( (cid:98) τ − τ ) · τ − ( (cid:98) β − β ) · sign( β ) | β | + 12 (cid:20) ( (cid:98) β − β ) β − ( (cid:98) τ − τ ) τ (cid:21) . where sign( x ) = 1 if x > x ) = − x <

0. Note this is not deﬁned for when β = 0although nor is the CV B . Hence, a ﬁrst order approximate variance, assuming that (cid:98) τ and (cid:98) β are uncorrelated, is equal toVar (cid:104) logit( (cid:99) M ) (cid:105) ≈ Var( (cid:98) τ ) · τ + Var( (cid:98) β ) · β . (20)An approximation to the bias can be found by taking the expected value of the second orderterm and is therefore equal to 12 (cid:20) Var( (cid:98) β ) · β − Var( (cid:98) τ ) · τ (cid:21) (21)21 eferences Bangert-Drowns, R. L., Hurley, M. M., & Wilkinson, B.

Rev educres , (1), 29–58. Biggerstaff, B. J., & Tweedie, R. L.

Stat Med , (7), 753–768. Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R.

Intro-duction to meta-analysis . Chichester, (UK): John Wiley & Sons.

Borenstein, M., Higgins, J. P. T., Hedges, L. V., & Rothstein, H. R.

Res Synth Methods , , 5–18. Cairns, M., Cumming, G., Calin-Jageman, R., & Prendergast, L. A.

TheDiamond Ratio: A Visual Indicator of the Extent of Heterogeneity in Meta-Analysis .https://psyarxiv.com/f4xus/.

Cochran, W. G.

Biometrics , (1), 101–129. Crippa, A., Khudyakov, P., Wang, M., Orsini, N., & Spiegelman, D.

Stat Med , (21), 3661–3675. Cumming, G., & Calin-Jageman, R.

Introduction to the new statistics: Estimation,open science, and beyond . New York (NY): Routledge.

Daly, L. E.

Am j epidemiol , (8), 783–790. DerSimonian, R., & Laird, N.

Controlled Clin Trials , (3), 177–188. Donner, A., & Zou, G.

Stat Methods Med Res , (4), 347–359. Goldstein, H., & Healy, M. J. R.

J r stat soc ser a stat soc , (1), 175–177. Higgins, Julian PT, & Thompson, Simon G . 2002. Quantifying heterogeneity in a meta-analysis.

Stat Med , (11), 1539–1558. Hoaglin, D. C.

Stat Med , (4), 485–495. Khan, K. S., & Chien, P. F. W.

BJOG , (10), 1173–1179. Kulinskaya, E., & Dollinger, M. B.

Stat Med , (4), 501–502.22 arholt, K. M., Tsiatis, A. A., & Gelber, R. D . 1994. Variability of coverage probabil-ities when applying a random eﬀects methodology for meta-analysis . Unpublished.

Lloyd, C. J.

J Am Stat Assoc , (412), 1154–1158. Malloy, M. J., Prendergast, L. A., & Staudte, R. G.

Stat Med , (11), 1842–1864. Newcombe, R. G.

Commun Stat Theory Methods , (17), 3154–3180. Newcombe, R. G.

Stat Methods Med Res , (5), 1774–1778. Normand, S.-L. T.

Stat Med , (3), 321–359. R Core Team . 2020.

R: A Language and Environment for Statistical Computing . R Founda-tion for Statistical Computing, Vienna, Austria.

R¨ucker, G., Schwarzer, G., Carpenter, J. R., & Schumacher, M.

BMC Med Res Methodol , (1), 79. Takkouche, B., Cadarso-Suarez, C., & Spiegelman, D.

Am J Epidemiol , (2), 206–215. Takkouche, B., Khudyakov, P., Costa-Bouzas, J., & Spiegelman, D.

Am J Epidemiol , (6),993–1004. Thompson, S. G.

Bmj , (6965), 1351–1355. Tryon, W. W., & Lewis, C.

J Educ Behav Stat , (2), 171–189. Tryon, W.W.

Psychol methods , (3), 371–386. Viechtbauer, W.

Stat Med , , 37–52. Viechtbauer, W.

J StatSoftw , (3), 1–48. Zhu, J., Ji, P., Pang, J., Zhong, Zh., Li, H., He, C., Zhang, J., & Zhao, C.