[PDF] Fisher transformation based Confidence Intervals of Correlations in Fixed- and Random-Effects Meta-Analysis

Abstract

Meta-analyses of correlation coefficients are an important technique to integrate results from many cross-sectional and longitudinal research designs. Uncertainty in pooled estimates is typically assessed with the help of confidence intervals, which can double as hypothesis tests for two-sided hypotheses about the underlying correlation. A standard approach to construct confidence intervals for the main effect is the Hedges-Olkin-Vevea Fisher-z (HOVz) approach, which is based on the Fisher-z transformation. Results from previous studies (Field, 2005; Hafdahl and Williams, 2009), however, indicate that in random-effects models the performance of the HOVz confidence interval can be unsatisfactory. To this end, we propose improvements of the HOVz approach, which are based on enhanced variance estimators for the main effect estimate. In order to study the coverage of the new confidence intervals in both fixed- and random-effects meta-analysis models, we perform an extensive simulation study, comparing them to established approaches. Data were generated via a truncated normal and beta distribution model. The results show that our newly proposed confidence intervals based on a Knapp-Hartung-type variance estimator or robust heteroscedasticity consistent sandwich estimators in combination with the integral z-to-r transformation (Hafdahl, 2009) provide more accurate coverage than existing approaches in most scenarios, especially in the more appropriate beta distribution simulation model.

Full PDF

FFisher transformation based ConﬁdenceIntervals of Correlations in Fixed- andRandom-Eﬀects Meta-Analysis

Thilo Welz ∗ , Philipp Doebler, Markus PaulySeptember 4, 2020 Abstract

Meta-analyses of correlation coeﬃcients are an important techniqueto integrate results from many cross-sectional and longitudinal researchdesigns. Uncertainty in pooled estimates is typically assessed with thehelp of conﬁdence intervals, which can double as hypothesis tests fortwo-sided hypotheses about the underlying correlation. A standardapproach to construct conﬁdence intervals for the main eﬀect is theHedges-Olkin-Vevea Fisher-z (HOVz) approach, which is based on theFisher-z transformation. Results from previous studies (Field, 2005;Hafdahl and Williams, 2009), however, indicate that in random-eﬀectsmodels the performance of the HOVz conﬁdence interval can be unsatis-factory. To this end, we propose improvements of the HOVz approach,which are based on enhanced variance estimators for the main eﬀectestimate. In order to study the coverage of the new conﬁdence intervalsin both ﬁxed- and random-eﬀects meta-analysis models, we perform anextensive simulation study, comparing them to established approaches.Data were generated via a truncated normal and beta distributionmodel. The results show that our newly proposed conﬁdence intervalsbased on a Knapp-Hartung-type variance estimator or robust het-eroscedasticity consistent sandwich estimators in combination with theintegral z-to-r transformation (Hafdahl, 2009) provide more accuratecoverage than existing approaches in most scenarios, especially in themore appropriate beta distribution simulation model.

Keywords : meta-analysis, correlations, conﬁdence intervals, Fisher’s ztransformation, Monte-Carlo-simulation ∗ Correspondence: [email protected] a r X i v : . [ s t a t . M E ] S e p Introduction

Quantifying the association of metric variables with the help of the Pearsoncorrelation coeﬃcient is a routine statistical technique to understand pat-terns of association. It is a basic ingredient of the data analysis of manycross-sectional and longitudinal designs, and is also indispensable for variouspsychometric and factor analytic techniques. When several reports are avail-able for comparable underlying populations, meta-analytic methods allow topool the available evidence (Hedges and Olkin, 1985; Hunter and Schmidt,2004), resulting in more stable and precise estimates.Systematic reviews based on meta-analyses of correlations are among themost cited in I/O-psychology, clinical psychology and educational psychology(e.g. Barrick and Mount, 1991; Aldao et al., 2010; Sirin, 2005, each withseveral thousand citations), and the methodological monograph on poolingcorrelations of Hunter and Schmidt (2004) is approaching 10,000 citationson Google Scholar at the time of writing this article. In addition, pooledcorrelations are the basis for meta-analytic structural equation modeling (e.g.,Cheung, 2015; Jak, 2015), and registered replication eﬀorts pool correlationsto re-assess ﬁndings of others (e.g., Open Science Collaboration, 2015).

Schulze (2004) provides a comprehensive summary of ﬁxed- and random-eﬀectsmeta-analysis of correlations. The most well known approaches are based onFisher’s z-transformation (Hedges and Olkin, 1985; Field, 2001, 2005; Hafdahland Williams, 2009) or on direct synthesis of correlations via the Hunter-Schmidt method (Hunter and Schmidt, 1994; Schulze, 2004). Regardless ofthe method and the purpose of the meta-analysis, the point estimate of thecorrelation is to be accompanied by an estimate of its uncertainty, i.e., astandard error (SE) or a conﬁdence interval (CI). Since the absolute value of acorrelation is bounded by one, a CI might be asymmetric in this context, i.e.,not centered around the point estimate. Also, CIs are often more useful thanSEs, because a null hypothesis of the form H : ρ = ρ can be rejected at level α , if an (1 − α )-CI does not include ρ (duality of hypothesis testing and CIs).A CI’s coverage is ideally close to the nominal (1 − α )-level, e.g., a multi-centerregistered replication report does neither want to rely on an anti-conservative(too narrow) CI that is overly prone to erroneously rejecting previous research,nor on a conservative (too wide) CI lacking statistical power to refute overlyoptimistic point estimates. Despite methodological developments since thelate 70s, the choice of a CI for a pooled correlation should be a careful one:2imulation experiments reported in this article reinforce the ﬁnding that CIsare too liberal when heterogeneity is present. The main objective of thispaper is a systematic investigation of competing methods, especially whenmoderate or even substantial amounts of heterogeneity are present, promisingreﬁned meta-analytic methods for correlations, especially those based on theFisher z-transformation. The remainder of the introduction reviews resultsfor (z-transformation based) pooling, and brieﬂy introduces relevant methodsfor variance estimation. A line of research summarized in Hunter and Schmidt (1994) pools correlationcoeﬃcients on the original scale from − z -transformation (=areatangens hyperbolicus) maps the openinterval ( − ,

1) to the real number line. Working with z values of correlationsavoids problems arising at the bounds and makes normality assumptions ofsome meta-analytic models more plausible (Hedges and Olkin, 1985). Field(2001) presents a systematic simulation study, and describes scenarios witha too liberal behavior of the HS methodology, but also reports problemswith z -transformed pooled values. A simulation strategy is also at the coreof Field (2005), who places a special emphasis on heterogeneous settings.He ﬁnds similar point estimates for z -transformation based and HS pooling,with the CIs from the HS method too narrow in the small sample case. Thesimulation study of Hafdahl and Williams (2009) includes a comprehensiveaccount of random-eﬀects modeling and related sources of bias in pointestimates. Focusing on point estimation, Hafdahl and Williams (2009) defend z -transformed pooling, but Hafdahl (2009) recommends the integral z-to-rtransformation as a further improvement. In the spirit of Hafdahl and Williams(2009), the current paper focuses on variance estimators and resulting CIs,especially in the case of heterogeneity. All CIs studied here are of the form g (ˆ θ ± ˆ σ ˆ θ ), for an appropriate back-transformation g (which is not needed in the HS approach), a point estimator3 θ and its SE estimator ˆ σ ˆ θ , which depends on the between-study varianceestimation. The CI’s quality will depend on an appropriate choice. In otherwords, especially when primary reports are heterogeneous and the underlyingstudy-speciﬁc true correlations vary, good estimators of the between studyvariance are needed to obtain neither too wide nor too narrow CIs.The comprehensive study of Veroniki et al. (2016) supports restrictedmaximum likelihood estimation (REML) as a default estimator of the betweenstudy variance. Since large values of the mean correlation cause REML con-vergence problems, the robust two-step Sidik and Jonkman (2006) estimatoris adopted here. Recently, Welz and Pauly (2020) showed that in the contextof meta-regression, the Knapp-Hartung-adjustment (Hartung, 1999; Hartungand Knapp, 2001, KH henceforth) aided (co-)variance estimation, motivatingto include KH-type CIs in the subsequent comparison.Less well known in the meta-analysis literature are bootstrap methodsfor variance estimation, which are not necessarily based on a parametric as-sumption for the random eﬀects distribution. The Wu (1986) Wild Bootstrap(WBS) intended for heteroscedastic situations is evaluated here. Bootstrap-ping is complemented by Sandwich estimators (heteroscedasticity consistent,HC; White, 1980) that Viechtbauer et al. (2015) introduced in the ﬁeld ofmeta-analysis. Recently, a wide range of HC estimators were calculated byWelz and Pauly (2020), whose comparison also includes the more recent HC4and HC5 estimators (Cribari-Neto and Zarkos, 2004; Cribari-Neto et al.,2007). In sum, the following comparison includes a comprehensive collectionof established and current variance estimators and resulting CIs.In Section 2 we introduce the relevant models and procedures for meta-analyses of correlations with more technical detail, as well as our proposedreﬁnements. In Section 3 we perform an extensive simulation study andpresent the results. An illustrative data example on the association of con-scientiousness (in the sense of the NEO-PI-R; Costa Jr and McCrae, 1985,2008) and medication adherence (Molloy et al., 2013) is presented in Section4. We ﬁnally close with a discussion of our ﬁndings and give an outlook forfuture research. For a bivariate metric random vector (

X, Y ) with existing second moments thecorrelation coeﬃcient (cid:37) = Cov(

X, Y ) / (cid:112) Var( X ) Var( Y ) is usually estimatedwith the (Pearson) correlation coeﬃcient4 = n (cid:80) i =1 ( x i − ¯ x )( y i − ¯ y ) (cid:114) n (cid:80) i =1 ( x i − ¯ x ) (cid:114) n (cid:80) i =1 ( y i − ¯ y ) , (1)where ( x i , y i ) , i = 1 , . . . , n , are independent observations of ( X, Y ).The Pearson correlation coeﬃcient is asymptotically consistent, i.e., forlarge sample sizes, its value converges to the true (cid:37) . It is also invariantunder linear transformations of the data. However, its distribution is diﬃcultto describe analytically and it is not an unbiased estimator of (cid:37) with anapproximate bias of E ( r − (cid:37) ) ≈ − (cid:37) (1 − (cid:37) ) / ( n −

1) (Hotelling, 1953).As correlation-based meta-analyses with r as eﬀect measure occur fre-quently in psychology and the social sciences we shortly recall the two standardmodels, cf. Schwarzer et al. (2015): the ﬁxed- and random-eﬀects model. The ﬁxed-eﬀect meta-analysis model is deﬁned as y i = µ + ε i , i = 1 , . . . , K, (2)where µ denotes the common (true) eﬀect, i.e., the (transformed) correlationin our case, K the number of available primary reports, and y i the observedeﬀect in the i th study. The model errors ε i are typically assumed to benormally distributed with ε i ind ∼ N (0 , σ i ). In this model the only source ofsampling error comes from within the studies. The estimate of the main eﬀect µ is then computed as a weighted mean viaˆ µ = K (cid:88) i =1 w i w y i , (3)where w := K (cid:80) i =1 w i and the study weights w i = ˆ σ − i are the reciprocals of the(estimated) sampling variances ˆ σ i . This is known as the inverse variancemethod . The ﬁxed-eﬀect model typically underestimates the observed totalvariability because it does not account for between-study variability (Schwarzeret al., 2015). However, it has the advantage of being able to pool observations,if individual patient data (IPD) are in fact available, allowing for greaterﬂexibility in methodology in this scenario.The random-eﬀects model extends the ﬁxed-eﬀect model by incorpo-rating a random-eﬀect that accounts for between-study variability, such asdiﬀerences in study population or execution. It is given by µ i = µ + u i + ε i , i = 1 , . . . , K, (4)5here the random-eﬀects u i are typically assumed to be independent and N (0 , τ ) distributed with between-study variance τ and ε i ind ∼ N (0 , σ i ).Furthermore, the random eﬀects ( u i ) i and the error terms ( ε i ) i are jointlyindependent. Thus, for τ = 0, the ﬁxed-eﬀect model is a special caseof the random-eﬀects model. The main eﬀect is again estimated via theweighted mean ˆ µ given in Equation (3) with study weights now deﬁned as w i = (ˆ σ i + ˆ τ ) − .A plethora of approaches exist for estimating the heterogeneity variance τ .Which estimator should be used has been discussed for a long time, withoutreaching a deﬁnitive conclusion. However, a consensus has been reachedthat the popular and easy to calculate DerSimonian-Laird estimator is notthe best option. Authors such as Veroniki et al. (2016) and Langan et al.(2019) have recommended to use iterative estimators for τ . We therefore(initially) followed their suggestion and used the REML estimator. However,in some settings, such as large (cid:37) values, the REML estimator had troubleconverging, even after the usual remedies of utilizing step halving and/orincreasing the maximum number of allowed iterations. We therefore opted touse the two-step estimator suggested by Sidik and Jonkman (SJ), which isdeﬁned by starting with a rough initial estimate of ˆ τ = K (cid:80) Ki =1 ( y i − ¯ y ) andis then updated via the expressionˆ τ SJ = 1 K − K (cid:88) i =1 w i ( y i − ˆ µ ) , (5)where w i = (cid:16) ˆ τ ˆ σ i +ˆ τ (cid:17) − and ˆ µ = (cid:80) Ki =1 w i y i (cid:80) Ki =1 w i (Sidik and Jonkman, 2005). Acomprehensive comparison of heterogeneity estimators for τ in the contextof random-eﬀects meta-analyses for correlations would be interesting but isbeyond the scope of this paper. Before discussing diﬀerent CIs for the commoncorrelation µ within Model (4), we take a short excursion on asymptotics for r in the one group case. Assuming bivariate normality of (

X, Y ), r is approximately N ( (cid:37), (1 − (cid:37) ) /n )-distributed for large sample sizes n (Lehmann, 2004). Here, bivariate normalityis a necessary assumption to obtain (1 − (cid:37) ) in the asymptotic variance(Omelka and Pauly, 2012). Plugging in r , we obtain an approximate (1 − α )-CIof the form r ± u − α/ (1 − r ) / √ n , where u − α/ denotes the (1 − α/ (cid:37) pool – the pooledsample correlation coeﬃcient – we obtain an approximate CI for (cid:37) byˆ (cid:37) pool ± u − α/ (1 − ˆ (cid:37) pool ) √ N , (6)where N := K (cid:80) i =1 n i is the pooled sample size. As this pooling of observationsonly makes sense if we assume that each study has the same underlying eﬀect,this approach is not feasible in the case of a random-eﬀects model, even if IPDwere available. Anyhow, even under IPD and a ﬁxed-eﬀects model, this CI issensitive to the normality assumption and the underlying sample size, as wedemonstrate in Table 1 for the case K = 1. We simulated bivariate data fromstandard normal and standardized lognormal distributions † with correlation (cid:37) ∈ { . , . } and study size n ∈ { , , } . Per setting we performed N = 10 ,

000 simulation runs. For the lognormal data coverage is extremelypoor in all cases, ranging from 53 − n = 20 but improved for larger samplesizes. This case study clearly illustrates that alternatives are needed, whenthe data cannot be assumed to stem from a normal distribution or samplesizes are small.Table 1: Empirical coverage of the asymptotic conﬁdence interval for K = 1,study sizes n ∈ { , , } and correlations (cid:37) ∈ { . , . } .Distribution (cid:37)

20 50 100normal 0.3 0.90 0.93 0.940.7 0.90 0.92 0.94lognormal 0.3 0.79 0.80 0.790.7 0.63 0.57 0.53After this short excursion we turn back to Model (4) and CIs for (cid:37) . The aggregation of correlations in the Hunter-Schmidt approach is done bysample size weighting: r HS = (cid:80) Ki =1 n i r i (cid:80) Ki =1 n i . (7) † Further details regarding the data generation can be found in the supplement. σ HS = 1 K (cid:32) (cid:80) Ki =1 n i ( r i − r HS ) (cid:80) Ki =1 n i (cid:33) , (8)which is supposed to perform reasonably well in both heterogeneous and homo-geneous settings (Schulze, 2004). In the simulation study we will investigate,whether this is in fact the case for the resulting CI: r HS ± u − α/ ˆ σ HS . A disadvantage of the asymptotic conﬁdence interval (6) is that the varianceof the limit distribution depends on the unknown correlation (cid:37) . This moti-vates a variance stabilizing transformation. A popular choice for correlationcoeﬃcients is the

Fisher-z transformation (Fisher, 1915), ρ (cid:55)→ z = 12 ln (cid:18) (cid:37) − (cid:37) (cid:19) = atanh( (cid:37) ) . (9)The corresponding inverse Fisher transformation is z (cid:55)→ tanh( z ) = (exp(2 z ) − / (exp(2 z ) + 1).The variance stabilizing property of the Fisher transformation followsfrom the δ -method (Lehmann, 2004), i.e., if √ n ( r − (cid:37) ) d −→ N (0 , (1 − (cid:37) ) )then √ n (ˆ z − z ) = √ n (cid:0) atanh( r ) − atanh( (cid:37) ) (cid:1) d −→ N (0 , . Following Schulze(2004), it is reasonable to substitute √ n by √ n −

3, i.e., to approximate thedistribution of ˆ z by N (cid:0) atanh( r ) , n − (cid:1) – still assuming bivariate normality.Thus, a single group approximate (1 − α )-CI can be constructed via tanh (cid:0) ˆ z ± u − α/ / √ N − (cid:1) . In the random-eﬀects model (4), the z-transformation may also be usedto construct a CI for the common correlation (cid:37) . Here, the idea is again touse inverse variance weights to deﬁne¯ z = K (cid:80) i =1 (cid:16) n i − + ˆ τ (cid:17) − z iK (cid:80) i =1 (cid:16) n i − + ˆ τ (cid:17) − , (10)where z i = atanh( r i ). A rough estimate of the variance of ¯ z is given by (cid:0) (cid:80) Ki =1 w i (cid:1) − . In the ﬁxed-eﬀect casel with τ = 0 this yields the variance8stimate (cid:16) (cid:80) Ki =1 ( n i − (cid:17) − = (cid:0) N − K (cid:1) − . Then ¯ z √ N − K approximatelyfollows a standard normal distribution and an approximate (1 − α )-CI is givenby tanh(¯ z ± u − α/ / √ N − K ) . Proceeding similarly in the random-eﬀectsmodel (4), one obtains the HOVz CI (

Hedges-Olkin-Vevea Fisher-z )tanh (cid:16) ¯ z ± u − α/ / (cid:0) K (cid:88) i =1 w i (cid:1) / (cid:17) , (11)with w i = ( n i − + ˆ τ ) − (Hedges and Olkin, 1985; Hedges and Vevea, 1998;Hafdahl and Williams, 2009). The above approximation of the variance of ¯ z via (cid:16)(cid:80) Ki =1 w i (cid:17) − can be ratherinaccurate, especially in random-eﬀects models. Although this is the exactvariance of ¯ z when the weights are chosen perfectly as w i = ( σ i + τ ) − , thisvariance estimate does not protect against (potentially substantial) errors inestimating ˆ σ i and ˆ τ (Sidik and Jonkman, 2006). Therefore, we propose animproved CI based on the Knapp-Hartung method (KH Hartung and Knapp,2001). KH proposed the following variance estimator for the estimate ˆ µ ofthe main eﬀect µ in a random-eﬀects meta-analysis:ˆ σ KH = (cid:100) Var KH (ˆ µ ) = 1 K − K (cid:88) i =1 w i w (ˆ µ i − ˆ µ ) , (12)where again w = (cid:80) Ki =1 w i . Hartung (1999) showed that if ˆ µ is normallydistributed, then (ˆ µ − µ ) / ˆ σ KH follows a t -distribution with K − − α )-CI for µ is given bytanh (cid:0) ¯ z ± t K − , − α/ · ˆ σ KH (cid:1) , (13)where t K − , − α/ is the 1 − α/ t -distribution with K − µ ).9 .3.2 Wild Bootstrap Approach Another possibility of estimating the variance of ¯ z is through bootstrapping.Bootstrapping belongs to the class of resampling methods. It allows theestimation of the sampling distribution of most statistics using randomsampling methods. The wild bootstrap is a subtype of bootstrapping thatis applicable in models, which exhibit heteroscedasticity. Roughly speaking,the idea of the wild bootstrap approach is to resample the response variablesbased on the residuals. The idea was originally proposed by Wu (1986) forregression analysis.We now propose a conﬁdence interval for (cid:37) based on a (data-dependent) wild-bootstrap approach (WBS) combined with the z-transformation. Theidea works as follows: We assume a random-eﬀects meta-analysis model withPearson’s correlation coeﬃcient as the eﬀect estimate (and K > r i , i = 1 , . . . , K , wetransform these using z-transformation to ˆ z i , i = 1 , . . . , K , and estimate z = atanh( (cid:37) ) via ˆ z = (cid:80) i w i w ˆ z i , where again w i = (ˆ σ i + ˆ τ ) − with ˆ σ i = n i − and w = (cid:80) i w i . Here, ˆ τ may be any consistent estimator of the between-study heterogeneity τ , where we have chosen the SJ estimator. We thencalculate the estimated residuals ˆ ε i = ˆ z − ˆ z i and use these to generate B newsets of study-level eﬀects ˆ z ∗ b , . . . , ˆ z ∗ Kb , b = 1 , . . . , B . Typical choices for B are1,000 or 5,000. The new study-level eﬀects are generated viaˆ z ∗ ib := ˆ z i + ˆ ε i · v i , (14)where v i ∼ N (0 , γ ). The usual choice of variance in a wild bootstrap is γ = 1. However, we propose a data dependent choice of either γ K = K − K − or γ K = K − K − . These choices are based on simulation results, which will bediscussed in detail in Section 3. We will later refer to these approaches asWBS1, WBS2 and WBS3 respectively. The corresponding values for γ are1, ( K − / ( K −

3) and ( K − / ( K − B newestimates of the main eﬀect z by calculatingˆ z ∗ b = (cid:80) Ki =1 w ∗ ib ˆ z ∗ ib (cid:80) Ki =1 w ∗ ib , (15)with w ∗ ib ≡ w i .We then estimate the variance of ˆ z via the empirical variance of ˆ z ∗ , . . . , ˆ z ∗ B , σ ∗ z := B − B (cid:80) i =1 (ˆ z ∗ i − ¯ z ∗ ) with ¯ z ∗ = B (cid:80) Bi =1 ˆ z ∗ i . It is now possible to constructa CI for z as in Equation (13) but with this new variance estimate of ¯ z . TheCI is back-transformed via the inverse Fisher transformation to obtain a CI10or the common correlation (cid:37) , given bytanh (cid:16) ˆ z ± ˆ σ ∗ z · t K − , − α/ (cid:17) . (16)Figure 1 provides a visual illustration of the wild bootstrap procedure discussedabove. Transform correlations r to z, fit REMA model, calculate residuals 𝑒 = 𝑧 − 𝑧 𝑖 Draw 𝑣 𝑖 ~𝑁(0, 𝛾) randomlyGenerate pseudo-data:Repeat B times Fit new REMA & save effectestimate

Figure 1: Visual illustration of the Wild Bootstrap Procedure for generating B bootstrap samples of the main eﬀect estimate on the z-scale Last but not least, we employ heteroscedasticity consistent (HC) vari-ance estimators White (sandwich estimators 1980). Diﬀerent forms ( HC − HC ) are in use for linear models (Rosopa et al., 2013). The motivation forthe robust HC variance estimators is that in a linear regression setting theusual variance estimate is unbiased when unit level errors are independent andidentically distributed. However, when the unit level variances are unequal,this approach can be biased. If we apply this to the meta-analysis context, thestudy level variances are almost always unequal due to varying sample sizes.Therefore, it makes sense to consider variance estimators that are unbiasedeven when the variances of the unit (study) level variances are diﬀerent.The extension of HC estimators to the meta-analysis context can be foundin Viechtbauer et al. (2015) for HC − HC and in Welz and Pauly (2020) forthe remaining HC − HC . Statistical tests based on these robust estimators11ave been shown to perform well, especially those of types HC and HC . Inthe special case of a random-eﬀects meta-analysis they are deﬁned as (see thesupplementary material of Welz and Pauly, 2020, for details)ˆ σ HC = 1 (cid:0) (cid:80) Ki =1 w i (cid:1) K (cid:88) j =1 w j ˆ ε j (1 − x jj ) − , ˆ σ HC = 1 (cid:0) (cid:80) Ki =1 w i (cid:1) K (cid:88) j =1 w j ˆ ε j (1 − x jj ) − δ j , δ j = min (cid:110) , x jj ¯ x (cid:111) with ˆ ε j = ˆ z j − ˆ z , x jj = w j (cid:80) Ki =1 w i and ¯ x = K K (cid:80) i =1 x ii . Plugging them into Equation(13) leads to the conﬁdence intervalstanh (cid:16) ˆ z ± ˆ σ HC j · t K − , − α/ (cid:17) , j = 3 , . (17) There is a fundamental problem with back-transforming CIs on z-scaleusing the inverse Fisher transformation tanh: Consider a random vari-able ξ ∼ N (artanh( (cid:37) ) , σ ) with some variance σ > ρ (cid:54) = 0. Then (cid:37) = tanh( E ( ξ )) (cid:54) = E (tanh( ξ )) by Jensen’s inequality. This means the back-transformation introduces an additional bias. A remedy was proposed byHafdahl (2009), who suggested to instead backtransform from the z-scaleusing an integral z-to-r transformation. This transformation is the expectedvalue of tanh( z ), where z ∼ N ( µ z , τ z ), i.e., ψ ( µ z | τ z ) = (cid:90) ∞−∞ tanh( t ) f ( t | µ z , τ z ) dt, (18)where f is the density of z . In practice we apply this transformation to thelower and upper conﬁdence limits on the z-scale, plugging in the estimates ˆ z and ˆ τ z . For example, for the KH-based CI (13) with z-scale conﬁdence bounds (cid:96) = ¯ z − t K − , − α/ · ˆ σ KH and u = ¯ z + t K − , − α/ · ˆ σ KH , with an estimatedheterogeneity ˆ τ z (on the z-scale), the CI is given by (cid:0) ψ ( (cid:96) | ˆ τ z ) , ψ ( u | ˆ τ z ) (cid:1) . If the true distribution of ˆ z is well approximated by a normal distributionand ˆ τ z is a good estimate of the heterogeneity variance (on the z-scale), ψ should improve the CIs as compared to simply back-transformation with tanh(Hafdahl, 2009). Following this argument, we also suggest using ψ instead of12anh. We calculate the integral with Simpson’s rule (S¨uli and Mayers, 2003),which is a method for the numerical approximation of deﬁnite integrals. 150subintervals over ˆ z ± · ˆ τ SJ were used, following Hafdahl (2009). Note thatthe HOVz CI is implemented in its original formulation, using tanh. We have suggested several new CIs for the mean correlation (cid:37) , all based onthe z-transformation, applicable in both, ﬁxed- and random-eﬀects models.In order to investigate their properties (especially coverage of ρ ), we performextensive Monte Carlo simulations. We focus on comparing the coverage ofour newly suggested CIs with existing methods. The Pearson correlation coeﬃcient is constrained to the interval [ − , µ i = µ + u i + ε i , assuming a normal distributionfor the random eﬀect u i ∼ N (0 , τ ) and error term ε i ∼ N (0 , σ i ) needs to beadjusted, since values outside of [ − ,

1] could result when sampling withoutany modiﬁcation.

Model 1:

As a ﬁrst option for generating the (true) study-level correlations,we consider a truncated normal distribution (cid:37) i ∼ N ( (cid:37), τ ): Sampling of (cid:37) i isrepeated until a sample lies within the interval [ − . , . (cid:37) : For a random variable X stemming from a truncated normaldistribution with mean µ and variance σ with lower bound a and upperbound b , it holds that (Johnson et al., 1994) E ( X ) = µ + σ φ (∆ ) − φ (∆ ) δ , where ∆ = ( a − µ ) /σ , ∆ = ( b − µ ) /σ and δ = Φ(∆ ) − Φ(∆ ). Here φ ( · )is the probability density function of the standard normal distribution andΦ( · ) its cumulative distribution function. Figure 19 in the supplement showsthe bias in our setting with a = − .

999 and b = 0 . σ ( φ (∆ ) − φ (∆ )) /δ . In addition to generating a biased eﬀect, the truncationalso leads to a reduction of the overall variance, which is smaller than τ . Model 2:

We therefore studied a second model, in which we generatethe (true) study level eﬀects (cid:37) i from transformed beta distributions: Y i =13( X i − .

5) with X i ∼ Beta ( α, β ) for studies i = 1 , . . . , K . The idea is tochoose the respective shape parameters α, β such that the following equalitieshold: E( Y i ) = 2 · (cid:18) αα + β − . (cid:19) ! = (cid:37), Var( Y i ) = 4 αβ ( α + β ) ( α + β + 1) ! = τ . The solution to the system of equations above is: α = (1 − (cid:37) )(1 + (cid:37) ) − τ τ · (cid:18) (cid:37) (cid:19) ,β = (cid:18) − (cid:37) (cid:37) (cid:19) α. In this second simulation scenario we also truncate the sampling distri-bution of the correlation coeﬃcients to [ − . , . τ values, theabove solution for α (and thus β ) may become negative, which is undeﬁnedfor parameters of a beta distribution. However, this was not a concern forthe parameters considered in our simulation study and only occurs in moreextreme scenarios. Parameter choices.

In order to get a broad overview of the performanceof all methods, we simulated various conﬁgurations of population correla-tion coeﬃcient, heterogeneity, sample size and number of studies. Here wechose the correlations (cid:37) ∈ { , . , . , . , . , . , . , . } and heterogene-ity τ ∈ { , . , . } . Moreover, we considered small to large number of K ∈ { , , , } studies with diﬀerent study sizes: For K = 5, we consid-ered (cid:126)n = (15 , , , ,

27) as vector of ’small’ study sizes and 4 · (cid:126)n for largerstudy sizes, corresponding to an average study size (¯ n ) of 20 and 80 subjects,respectively. For all other choices of K we proceeded similarly, stackingcopies (cid:126)n behind each other, e.g., the sample size vectors ( (cid:126)n, (cid:126)n ) and 4 · ( (cid:126)n, (cid:126)n )for K = 10. Additionally, we considered two special scenarios: The case offew and heterogeneous studies, with study size vector (23 , , , , (cid:126)n ∗ , (cid:126)n ∗ ) with (cid:126)n ∗ = (210 , , , , , , , , , K = 20 studies with an average of 300 study subjects.14hus, in total we simulated 8( (cid:37) ) × τ ) × K, study size vector) × N = 10 ,

000 simulation runs,where for the WBS CI each run was based upon B = 1 ,

000 bootstrapreplications. The primary focus was on comparing empirical coverage withnominal coverage being 1 − α = 0 .

95. For 10,000 iterations, the Monte Carlostandard error of the simulated coverage will be approximately (cid:113) . × . ≈ . For ease of presentation, we aggregated the multiple simulation settings withregard to number and size of studies. The graphics therefore display the meanobserved coverage for each conﬁdence interval type and true main eﬀect (cid:37) .Results are separated by heterogeneity τ and simulation design. The latterrefers to the truncated normal-distribution approach and the transformedbeta-distribution approach respectively. More detailed simulation results forall considered settings are given in the supplement. We ﬁrst discuss the results based on the truncated normal distribution (Model1). In the case of no heterogeneity (ﬁxed-eﬀect model), Figure 2 shows thatthe new methods control the nominal coverage of 95% well. Only the ﬁrst wildbootstrap (WBS1) CI exhibits a liberal behaviour, yielding empirical coverageof approximately 93 . − (cid:37) did notaﬀect any of the methods. 15 .750.800.850.900.951.00 0.0 0.2 0.4 0.6 0.8 1.0 (cid:37) E m p i r i c a l C o v e r ag e CItype( τ = 0) HC3HC4HOVzHSKHWBS1WBS2WBS3

Figure 2: Mean Coverage for truncated normal distribution model with τ = 0,aggregated across all number of studies and study size settingsIn the truncated-normal setup with moderate heterogeneity of τ = 0 .

16 inFigure 3, several things change: First, there is a strong drop-oﬀ in coveragefor larger correlations (cid:37) ≥ .

8. For HS this drop-oﬀ occurs earlier for (cid:37) ≥ . (cid:37) ≤ .

7, HS is even more liberal than for τ = 0 with coveragearound 87 . − (cid:37) ≤ .

7. For all new methods a slight decrease in coverage can be observedfor increasing values of (cid:37) from 0 to 0 .

7. Moreover, there is a slight uptickat (cid:37) = 0 . HC , HC and KH CIs show the best control of nominal coverage in thissetting. 16 .750.800.850.900.951.00 0.0 0.2 0.4 0.6 0.8 1.0 (cid:37) E m p i r i c a l C o v e r ag e CItype( τ = . HC3HC4HOVzHSKHWBS1WBS2WBS3

Figure 3: Mean Coverage for truncated normal distribution model with τ = 0 .

16, aggregated across all number of studies and study size settingsWe now consider Model 2 with a transformed beta distribution model. Inthe ﬁxed eﬀects case ( τ = 0) the two models are equivalent so we obtainthe same coverage as in Figure 2. For moderate heterogeneity ( τ = 0 .

16, cf.Figure 4), our newly proposed methods clearly outperform HOVz and HS,with a good control of nominal coverage. Only for (cid:37) = 0 . ≈ −

88% for (cid:37) ≤ . (cid:37) = 0 .

9. For (cid:37) > . .750.800.850.900.951.00 0.0 0.2 0.4 0.6 0.8 1.0 (cid:37) E m p i r i c a l C o v e r ag e CItype( τ = . HC3HC4HOVzHSKHWBS1WBS2WBS3

Figure 4: Mean Coverage for transformed beta distribution model with τ = 0 .

16, aggregated across all number of studies and study size settingsFor ease of presentation, the results for the case of extreme heterogeneitywith τ = 0 . K ,coverage is approximately correct for (cid:37) ≤ . (cid:37) ≤ . n ∈ { , } respectively. For an increasing number of studies K , HOVz remains largelyunchanged, whereas coverage of the new methods gets progressively worse (i.e.the drop-oﬀ in coverage occurs earlier for an increasing number of studies).For K = 40 the new CIs only have correct coverage for (cid:37) ≤ .

3. In the caseof the beta distribution model with τ = 0 . (cid:37) ≤ . K . HOVz only has correct coverage for simultaneously (cid:37) ≤ . K .For K = 5 HS has coverage of ≤ . However, for increasing number of studies (whether large or small), HSappears to converge towards nominal coverage. In particular, for K = 40 and (cid:37) > . We simulated the expected conﬁdence interval lengths for all methods dis-cussed in this paper. The detailed results are provided in Figures 11 – 16 inthe supplement. The results again depend on both the assumed model andthe amount of heterogeneity τ .Generally we observe that the conﬁdence intervals become increasinglynarrow for increasing values of ρ and increasingly wide for larger values of τ . For the truncated normal distribution model and τ = 0, HS (on average)yields the shortest conﬁdence intervals and HOVz the widest, with the otherCIs lying in between with quite similar lengths. Only for K = 5 the CIsbased on the wild bootstrap are quite wide, indicating that potentially morestudies are required to reliably use the wild bootstrap based approaches. For τ = 0 .

16 HS again yields the shortest CIs in all scenarios. For small K, theWBS approaches yield the widest CIs and for more studies, HOVz is thewidest, when ρ is small, but becoming nearly as narrow as HS when ρ is closeto 1. The lengths of the other CIs are nearly identical for K = 40, whereasfor fewer studies there are considerable diﬀerences. This relative evaluationalso holds for τ = 0 . τ = 0,the results are equivalent to the truncated normal distribution model. For τ = 0 .

16 and K = 5 the widths of the new CIs decrease with increasing (cid:37) until ρ = 0 .

7. Interestingly, the widths of these CIs then increase again for ρ > .

7, which could not be observed in the truncated normal model. Thiseﬀect becomes much less pronounced for increasing number of studies K. HSis always more narrow than the new CIs and for K ≥

20 HOVz is the widestat ρ = 0 but even more narrow than HS for ρ ≥ .

8. For τ = 0 . ρ and HOVz is most narrow for ρ > . We summarize our ﬁndings by providing recommendations to practitionerswishing to choose between the considered methods. The recommendationswill depend on the assumed model and how much heterogeneity is presentin the data. We believe the beta distribution model is better suited for19andom-eﬀects meta-analyses of correlations. Reminder: HOVz employsthe inverse Fisher transformation, whereas our newly proposed conﬁdenceintervals employ the integral z-to-r transformation suggested by Hafdahl(2009). • τ = 0 (Fixed-Eﬀect Model) : HS and HOVz are not recommendable.We recommend using KH, HC3 or HC4. • τ = 0 . Truncated normal model:

HS and HOVz are not recom-mendable and we recommend using KH, HC3 or HC4. For | ρ | > . K = 40, HOVzmay be preferable. Beta distribution model:

HS and HOVz arenot recommendable. All new conﬁdence intervals exhibit satisfactorycoverage. For small K, WBS approaches yield wider conﬁdence intervals,therefore preferably use KH, HC3 or HC4. • τ = 0 . Truncated normal model:

HS is not recommendable. For K = 5 and | ρ | ≤ . K ≥

10 and | ρ | ≤ . | ρ | > . Beta distribution model:

HOVz isnot recommendable. For | ρ | ≤ . K ≥

40 and | ρ | > . K ≤

20 and | ρ | > . Between 25 and 50% of patients fail to take their medication as prescribed bytheir caretaker (Molloy et al., 2013). Some studies have shown that medicationadherence tends to be better in patients who score higher in conscientiousness(from the ﬁve-factor model of personality). Table 2 contains data on 16studies, which investigated the correlation between conscientiousness andmedication adherence. These studies were ﬁrst analyzed in the form ofa meta-analysis in Molloy et al. (2013). The columns of Table 2 containinformation on the authors of the respective study, the year of publication,the sample size of study i ( n i ), the observed correlation in study i , the numberof variables controlled for (controls), study design, the type of adherencemeasure (a measure), the type of conscientiousness measure (c measure), themean age of study participants (mean age) and the methodological quality(as scored by the authors on a scale from one to four, with higher scoresindicating higher quality). 20egarding the measurement of conscientiousness: Where NEO ( Neuroticism-Extraversion-Openness ) is indicated as c measure, the personality trait ofconscientiousness was measured by one of the various types of NEO personalityinventories (PI Costa Jr and McCrae, 1985, 2008).Table 2: Data from 16 studies investigating the correlation between conscien-tiousness and medication adherence

Study i authors year n i r i controls design a measure c measure mean age quality1 Axelsson et al. 2009 109 0.19 none cross-sectional self-report other 22.00 12 Axelsson et al. 2011 749 0.16 none cross-sectional self-report NEO 53.59 13 Bruce et al. 2010 55 0.34 none prospective other NEO 43.36 24 Christensen et al. 1999 107 0.32 none cross-sectional self-report other 41.70 15 Christensen & Smith 1995 72 0.27 none prospective other NEO 46.39 26 Cohen et al. 2004 65 0.00 none prospective other NEO 41.20 27 Dobbels et al. 2005 174 0.17 none cross-sectional self-report NEO 52.30 18 Ediger et al. 2007 326 0.05 multiple prospective self-report NEO 41.00 39 Insel et al. 2006 58 0.26 none prospective other other 77.00 210 Jerant et al. 2011 771 0.01 multiple prospective other NEO 78.60 311 Moran et al. 1997 56 -0.09 multiple prospective other NEO 57.20 212 O’Cleirigh et al. 2007 91 0.37 none prospective self-report NEO 37.90 213 Penedo et al. 2003 116 0.00 none cross-sectional self-report NEO 39.20 114 Quine et al. 2012 537 0.15 none prospective self-report other 69.00 215 Stilley et al. 2004 158 0.24 none prospective other NEO 46.20 316 Wiebe & Christensen 1997 65 0.04 none prospective other NEO 56.00 1 We performed both a ﬁxed- and random-eﬀects meta-analysis, using allconsidered methods. For the random-eﬀects model we used the SJ estimatorto estimate the between-study heterogeneity variance τ . Combining allavailable studies yielded r F E = 0 . r RE = 0 .

154 and ˆ τ SJ = 0 . r F E = 0 .

168 and r RE = 0 .

170 resulted and slightly lower values for theprospective studies ( r F E = 0 . r RE = 0 . τ SJ = 0 .

007 (cross-sectional) and ˆ τ SJ = 0 .

016 (prospective), respectively. InTable 3 we provide values of all CIs discussed in this paper.21able 3: Random-eﬀects model conﬁdence intervals for all studies and sub-groups separated by study design, original data from Molloy et al. (2013)Study designApproach All Designs cross-sectional prospectiveHOVz [0.081, 0.221] [0.067, 0.266] [0.050, 0.240]HS [0.073, 0.174] [0.100, 0.220] [0.035, 0.166]KH [0.080, 0.218] [0.037, 0.291] [0.043, 0.239]WBS1 [0.086, 0.213] [0.063, 0.267] [0.051, 0.232]WBS2 [0.079, 0.219] [0.053, 0.276] [0.043, 0.239]WBS3 [0.084, 0.215] [0.058, 0.272] [0.048, 0.234]HC3 [0.081, 0.218] [0.041, 0.288] [0.041, 0.241]HC4 [0.083, 0.216] [0.054, 0.276] [0.045, 0.237]In the case of all studies ( K = 16), all methods yield quite similar CIsexcept for HS. Additional simulations for this situation ( K = 16, τ = 0 . n i as in Table 3) are given in the supplement and show a coverage of around80% for HS, while all other methods exhibit a fairly accurate coverage ofaround 95% and HOVz with around 94%. Thus, the sacriﬁce for the narrowHS CIs is poor coverage. Additional analyses of other datasets are given inthe supplement. We introduced several new methods to construct conﬁdence intervals of themain eﬀect in random-eﬀects meta-analyses of correlations, based on theFisher-z transformation. We compared these to the standard HOVz andHunter-Schmidt conﬁdence intervals and, following the suggestion by Hafdahl(2009), utilized an integral z-to-r transformation instead of the inverse Fishertransformation. We performed an extensive Monte Carlo simulation study, inorder to assess the coverage and mean interval length of all CIs. In addition tothe truncated normal distribution model considered by Hafdahl and Williams(2009) and Field (2005) we also investigated a transformed beta distributionmodel, which exhibits less bias in the generation of the study level eﬀects.22he results of our simulations show that for low and moderate hetero-geneity and correlations of | (cid:37) | ≤ .

7, our newly proposed conﬁdence intervalsimproved coverage considerably over the classical HOVz and Hunter-Schmidtapproaches. However, for extreme heterogeneity and | (cid:37) | > . r , givenby r ∗ = r (1 − r )2( n − , as the (negative) bias of r is usually approximated by B r = − (cid:37) (1 − (cid:37) )2( n − (Hotelling, 1953; Schulze, 2004). However, this bias correctionactually made coverage worse in the studied settings.23 eferences Aldao, A., Nolen-Hoeksema, S., and Schweizer, S. (2010). Emotion-regulationstrategies across psychopathology: A meta-analytic review.

Clinical psy-chology review , 30(2):217–237.Barrick, M. R. and Mount, M. K. (1991). The big ﬁve personality dimensionsand job performance: a meta-analysis.

Personnel psychology , 44(1):1–26.Chalkidou, A., Landau, D., Odell, E., Cornelius, V., O’Doherty, M., andMarsden, P. (2012). Correlation between Ki-67 immunohistochemistry and18f-ﬂuorothymidine uptake in patients with cancer: A systematic reviewand meta-analysis.

European Journal of Cancer , 48(18):3499–3513.Cheung, M. W.-L. (2015).

Meta-analysis: A structural equation modelingapproach . John Wiley & Sons, Hoboken, NJ.Costa Jr, P. T. and McCrae, R. R. (1985).

The NEO personality inventory .Psychological Assessment Resources Odessa, FL.Costa Jr, P. T. and McCrae, R. R. (2008).

The Revised NEO PersonalityInventory (NEO-PI-R).

Sage Publications, Inc.Cribari-Neto, F., Souza, T. C., and Vasconcellos, K. L. (2007). Inferenceunder heteroskedasticity and leveraged data.

Communication in Statistics- Theory and Methods , 36(10):1877–1888.Cribari-Neto, F. and Zarkos, S. G. (2004). Leverage-adjusted heteroskedasticbootstrap methods.

Journal of Statistical Computation and Simulation ,74(3):215–232.Field, A. P. (2001). Meta-analysis of correlation coeﬃcients: a Monte Carlocomparison of ﬁxed-and random-eﬀects methods.

Psychological Methods ,6(2):161–180.Field, A. P. (2005). Is the meta-analysis of correlation coeﬃcients accuratewhen population correlations vary?

Psychological Methods , 10(4):444–467.Fisher, R. A. (1915). Frequency distribution of the values of the correlationcoeﬃcient in samples from an indeﬁnitely large population.

Biometrika ,10(4):507–521.Hafdahl, A. R. (2009). Improved ﬁsher z estimators for univariate random-eﬀects meta-analysis of correlations.

British Journal of Mathematical andStatistical Psychology , 62(2):233–261.24afdahl, A. R. and Williams, M. A. (2009). Meta-analysis of correlationsrevisited: Attempted replication and extension of ﬁeld’s (2001) simulationstudies.

Psychological Methods , 14(1):24–42.Hartung, J. (1999). An alternative method for meta-analysis.

BiometricalJournal: Journal of Mathematical Methods in Biosciences , 41(8):901–916.Hartung, J. and Knapp, G. (2001). A reﬁned method for the meta-analysisof controlled clinical trials with binary outcome.

Statistics in Medicine ,20(24):3875–3889.Hedges, L. and Vevea, J. (1998). Fixed-and random-eﬀects models in meta-analysis.

Psychological Methods , 3(4):486–504.Hedges, L. V. and Olkin, I. (1985).

Statistical methods for meta-analysis.

Academic Press. San Diego, CA, USA.Hotelling, H. (1953). New light on the correlation coeﬃcient and its transforms.

Journal of the Royal Statistical Society. Series B (Methodological) , 15(2):193–232.Hunter, J. E. and Schmidt, F. L. (1994). Estimation of sampling errorvariance in the meta-analysis of correlations: Use of average correlation inthe homogeneous case.

Journal of Applied Psychology , 79(2):171.Hunter, J. E. and Schmidt, F. L. (2004).

Methods of meta-analysis: Correctingerror and bias in research ﬁndings . Sage.IntHout, J., Ioannidis, J. P., and Borm, G. F. (2014). The hartung-knapp-sidik-jonkman method for random eﬀects meta-analysis is straightforwardand considerably outperforms the standard dersimonian-laird method.

BMCMedical Research Methodology , 14(1):25.Jak, S. (2015).

Meta-analytic structural equation modelling . Springer, NewYork, NY.Johnson, N. L., Kotz, S., and Balakrishnan, N. (1994).

Continuous univariatedistributions . Wiley New York.Langan, D., Higgins, J. P., Jackson, D., Bowden, J., Veroniki, A. A., Kon-topantelis, E., Viechtbauer, W., and Simmonds, M. (2019). A comparisonof heterogeneity variance estimators in simulated random-eﬀects meta-analyses.

Research Synthesis Methods , 10(1):83–98.25ehmann, E. L. (2004).

Elements of large-sample theory . Springer Science &Business Media.Molloy, G., O’carroll, R., and Ferguson, E. (2013). Conscientiousness andmedication adherence: a meta-analysis.

Annals of Behavioral Medicine ,47(1):92–101.Morris, T. P., White, I. R., and Crowther, M. J. (2019). Using simulationstudies to evaluate statistical methods.

Statistics in Medicine , 38(11):2074–2102.Omelka, M. and Pauly, M. (2012). Testing equality of correlation coeﬃcientsin two populations via permutation methods.

Journal of Statistical Planningand Inference , 142(6):1396–1406.Open Science Collaboration (2015). Estimating the reproducibility of psycho-logical science.

Science , 349(6251).Osburn, H. and Callender, J. (1992). A note on the sampling variance of themean uncorrected correlation in meta-analysis and validity generalization.

Journal of Applied Psychology , 77(2):115–122.Rosopa, P. J., Schaﬀer, M. M., and Schroeder, A. N. (2013). Managing het-eroscedasticity in general linear models.

Psychological Methods , 18(3):335–351.Santos, S., Almeida, I., Oliveiros, B., and Castelo-Branco, M. (2016). Therole of the amygdala in facial trustworthiness processing: A systematicreview and meta-analyses of fMRI studies.

PloS One , 11(11):e0167276.Schulze, R. (2004).

Meta-analysis: A comparison of approaches . HogrefePublishing.Schwarzer, G., Carpenter, J. R., and R¨ucker, G. (2015).

Meta-analysis withR . Springer, Cham.Sidik, K. and Jonkman, J. N. (2005). Simple heterogeneity variance estimationfor meta-analysis.

Journal of the Royal Statistical Society: Series C (AppliedStatistics) , 54(2):367–384.Sidik, K. and Jonkman, J. N. (2006). Robust variance estimation for ran-dom eﬀects meta-analysis.

Computational Statistics & Data Analysis ,50(12):3681–3701. 26irin, S. R. (2005). Socioeconomic status and academic achievement: A meta-analytic review of research.

Review of educational research , 75(3):417–453.S¨uli, E. and Mayers, D. F. (2003).

An introduction to numerical analysis .Cambridge University Press, Cambridge.Veroniki, A. A., Jackson, D., Viechtbauer, W., Bender, R., Bowden, J., Knapp,G., Kuss, O., Higgins, J., Langan, D., and Salanti, G. (2016). Methods toestimate the between-study variance and its uncertainty in meta-analysis.

Research Synthesis Methods , 7(1):55–79.Viechtbauer, W., L´opez-L´opez, J. A., S´anchez-Meca, J., and Mar´ın-Mart´ınez,F. (2015). A comparison of procedures to test for moderators in mixed-eﬀects meta-regression models.

Psychological Methods , 20(3):360–374.Welz, T. and Pauly, M. (2020). A simulation study to compare robusttests for linear mixed-eﬀects meta-regression.

Research Synthesis Methods ,11(3):331–342.White, H. (1980). A heteroskedasticity-consistent covariance matrix estimatorand a direct test for heteroskedasticity.

Econometrica , 48(4):817–838.Wu, C.-F. J. (1986). Jackknife, bootstrap and other resampling methods inregression analysis.

Annals of Statistics , 14(4):1261–1295.27 cknowledgments

The authors gratefully acknowledge the computing time provided on the LinuxHPC cluster at Technical University Dortmund (LiDO3), partially funded inthe course of the Large-Scale Equipment Initiative by the German ResearchFoundation (DFG) as project 271512359. Furthermore, we thank Marl´eneBaumeister and Lena Schmid for many helpful discussions, and Philip Buczakfor ﬁnding interesting data sets. This work was supported by the GermanResearch Foundation project (Grant no. PA-2409 7-1).

Data Availability Statement

The R-scripts used for our simulations and data analyses will be made publiclyavailable on figshare (pending publication). The dataset from Molloy et al.(2013) can be found in the metafor package in R and the datasets consideredfor re-analysis are from Chalkidou et al. (2012) and Santos et al. (2016)respectively. 28 upplementA Complete Results of Simulation Study We present the complete simulation results regarding coverage and intervallengths for both models under the settings K ∈ { , , , } and for meanstudy sizes ¯ n ∈ { , } . Additionally we considered the RMSE of thevariance estimates of ¯ z for the conﬁdence intervals based on the Fisher-ztransformation. K = Empirical Coverage, n=20 K = Empirical Coverage, n=80 K = K = K = K = Rho K = Rho K = CItype

HC3HC4HOVzHSKHWBS1WBS2WBS3

Figure 5: Mean Coverage for truncated normal distribution model with τ = 029 .60.70.80.91.0 0.00 0.25 0.50 0.75 1.00 K = Empirical Coverage, n=20 K = Empirical Coverage, n=80 K = K = K = K = Rho K = Rho K = CItype

HC3HC4HOVzHSKHWBS1WBS2WBS3

Figure 6: Mean Coverage for truncated normal distribution model with τ = 0 .

16 30 .20.40.60.81.0 0.00 0.25 0.50 0.75 1.00 K = Empirical Coverage, n=20 K = Empirical Coverage, n=80 K = K = K = K = Rho K = Rho K = CItype

HC3HC4HOVzHSKHWBS1WBS2WBS3

Figure 7: Mean Coverage for truncated normal distribution model with τ = 0 . .70.80.91.0 0.00 0.25 0.50 0.75 1.00 K = Empirical Coverage, n=20 K = Empirical Coverage, n=80 K = K = K = K = Rho K = Rho K = CItype

HC3HC4HOVzHSKHWBS1WBS2WBS3

Figure 8: Mean Coverage for transformed beta distribution model with τ = 032 .60.70.80.91.0 0.00 0.25 0.50 0.75 1.00 K = Empirical Coverage, n=20 K = Empirical Coverage, n=80 K = K = K = K = Rho K = Rho K = CItype

HC3HC4HOVzHSKHWBS1WBS2WBS3

Figure 9: Mean Coverage for transformed beta distribution model with τ = 0 .

16 33 .20.40.60.81.0 0.00 0.25 0.50 0.75 1.00 K = Empirical Coverage, n=20 K = Empirical Coverage, n=80 K = K = K = K = Rho K = Rho K = CItype

HC3HC4HOVzHSKHWBS1WBS2WBS3

Figure 10: Mean Coverage for transformed beta distribution model with τ = 0 . .20.40.6 0.00 0.25 0.50 0.75 1.00 K = Empirical Coverage, n=20 K = Empirical Coverage, n=80 K = K = K = K = Rho K = Rho K = CItype

HC3HC4HOVzHSKHWBS1WBS2WBS3

Figure 11: Mean CI length for truncated normal distribution model with τ = 0 35 .20.40.60.8 0.00 0.25 0.50 0.75 1.00 K = Empirical Coverage, n=20 K = Empirical Coverage, n=80 K = K = K = K = Rho K = Rho K = CItype

HC3HC4HOVzHSKHWBS1WBS2WBS3

Figure 12: Mean CI length for truncated normal distribution model with τ = 0 .

16 36 .40.60.81.01.2 0.00 0.25 0.50 0.75 1.00 K = Empirical Coverage, n=20 K = Empirical Coverage, n=80 K = K = K = K = Rho K = Rho K = CItype

HC3HC4HOVzHSKHWBS1WBS2WBS3

Figure 13: Mean CI length for truncated normal distribution model with τ = 0 . .250.500.75 0.00 0.25 0.50 0.75 1.00 K = Empirical Coverage, n=20 K = Empirical Coverage, n=80 K = K = K = K = Rho K = Rho K = CItype

HC3HC4HOVzHSKHWBS1WBS2WBS3

Figure 14: Mean CI length for transformed beta distribution model with τ = 0 38 .250.500.751.00 0.00 0.25 0.50 0.75 1.00 K = Empirical Coverage, n=20 K = Empirical Coverage, n=80 K = K = K = K = Rho K = Rho K = CItype

HC3HC4HOVzHSKHWBS1WBS2WBS3

Figure 15: Mean CI length for transformed beta distribution model with τ = 0 .

16 39 .51.0 0.00 0.25 0.50 0.75 1.00 K = Empirical Coverage, n=20 K = Empirical Coverage, n=80 K = K = K = K = Rho K = Rho K = CItype

HC3HC4HOVzHSKHWBS1WBS2WBS3

Figure 16: Mean CI length for transformed beta distribution model with τ = 0 . B Simulations based on the dataset from Sec-tion 4

We also added a simulation setting that is speciﬁc to the dataset from (Molloyet al., 2013) discussed in Section 4. This means the number of studies, “true”heterogeneity and study eﬀects in the simulation were chosen according tothe estimates from the original dataset. There were K = 16 studies, withˆ τ = 0 .

012 and a range of study sizes between 55 and 771. The results aredisplayed in Table 4. Our newly proposed conﬁdence intervals have good40ontrol of the nominal coverage 95% both for the truncated normal andbeta distribution simulation designs. HOVz was slightly conservative withapproximately 94% coverage. HS performed worst out of the consideredapproaches, with only around 80% coverage.distribution HOVz KH WBS1 WBS2 WBS3 HC3 HC4 HSnormal 0.938 0.954 0.946 0.948 0.947 0.954 0.948 0.798beta 0.940 0.953 0.947 0.946 0.947 0.954 0.949 0.797Table 4: Empirical coverage in simulation setting based on data from Molloyet al. (2013) with K=16, τ = 0 .

012 and study sizes between 55 and 771

C Additional Information (cid:37) E m p i r i c a l C o v e r ag e CItype( τ = . HC3HC4HOVzHSKHWBS1WBS2WBS3

Figure 17: Mean Coverage for truncated normal distribution model with τ = 0 .

4, aggregated across all number of studies and study size settings41 .750.800.850.900.951.00 0.0 0.2 0.4 0.6 0.8 1.0 (cid:37) E m p i r i c a l C o v e r ag e CItype( τ = . HC3HC4HOVzHSKHWBS1WBS2WBS3

Figure 18: Mean Coverage for transformed beta distribution model with τ = 0 .

4, aggregated across all number of studies and study size settingsComment regarding Table 1:The standardized log-normal distribution simulated in Table 1, was gener-ated in the following manner: Y i = X i − exp(0 . (cid:112) exp(2) − exp(1) , where X i iid ∼ LN (0 , Y i are iid and follow a standardized log-normal distribution with mean 0 and variance 1.42 m B i a s s Figure 19: Bias of truncated normal distribution on [-0.999,0.999] for variousmeans µ and standard deviations σ D Reanalysis of other Meta-analyses

In order to gain additional insights into the consequences of implementing ournewly proposed methods in practice, we reanalyzed previous meta-analysesof correlations. To this end we considered two datasets from Chalkidou et al.(2012) and Santos et al. (2016).Santos et al. (2016) investigated the role of the amygdala in facial trust-worthiness through meta-analysis of fMRI studies. They performed a meta-analysis of 12 studies, investigating the correlation between amygdala response43o trustworthy vs. untrustworthy facial signals under fMRI. The data is pre-sented in Table 5 † . Study 1 2 3 4 5 6 7 8 9 10 11 12 r i .654 .072 .998 .892 .313 .069 -.971 .989 .989 .473 .594 .999 n i

24 16 12 14 15 15 6 12 11 32 14 12

Table 5: Reported correlations and sample sizes of 12 studies on amygdalaresponse to facial signals of trustworthiness under fMRI in Santos et al. (2016).This is clearly one of the challenging scenarios with extreme correlationsand high heterogeneity. For the random-eﬀects meta-analysis Santos et al.(2016) reported a total estimated eﬀect of 0.851 with a 95% conﬁdence intervalof [ . , . | (cid:37) | values.Also, as in the simulations, HS yields the most (probably overly) narrowinterval.Chalkidou et al. (2012) examined the correlation between ki-67 immuno-histochemistry and 18F-Fluorothymidine uptake in patients with cancer. Thedata comes from a total of 9 studies, containing data from both biopsies andsurgeries, and is presented in Table 6. Study 1 2 3 4 5 6 7 8 9 r i .21 .79 .82 .80 .04 .92 .84 .77 .57 n i

43 12 9 10 20 20 21 6 22