[PDF] A Bias Correction Method in Meta-analysis of Randomized Clinical Trials with no Adjustments for Zero-inflated Outcomes

Abstract

Many clinical endpoint measures, such as the number of standard drinks consumed per week or the number of days that patients stayed in the hospital, are count data with excessive zeros. However, the zero-inflated nature of such outcomes is often ignored in analyses, which leads to biased estimates and, consequently, a biased estimate of the overall intervention effect in a meta-analysis. The current study proposes a novel statistical approach, the Zero-inflation Bias Correction (ZIBC) method, that can account for the bias introduced when using the Poisson regression model despite a high rate of zeros in the outcome distribution for randomized clinical trials. This correction method utilizes summary information from individual studies to correct intervention effect estimates as if they were appropriately estimated in zero-inflated Poisson regression models. Simulation studies and real data analyses show that the ZIBC method has good performance in correcting zero-inflation bias in many situations. This method provides a methodological solution in improving the accuracy of meta-analysis results, which is important to evidence-based medicine.

Full PDF

AA Bias Correction Method in Meta-analysis of RandomizedClinical Trials with no Adjustments for Zero-inﬂated Outcomes

Zhengyang Zhou, Ph.D.University of North Texas Health Science Center, Fort Worth, TXMinge Xie, Ph.D.Rutgers University, Piscataway, NJDavid Huh, Ph.D.University of Washington, Seattle, WAEun-Young Mun, Ph.D.University of North Texas Health Science Center, Fort Worth, TX

Summary

Many clinical endpoint measures, such as the number of standard drinks consumedper week or the number of days that patients stayed in the hospital, are count data withexcessive zeros. However, the zero-inﬂated nature of such outcomes is often ignoredin analyses, which leads to biased estimates and, consequently, a biased estimate ofthe overall intervention eﬀect in a meta-analysis. The current study proposes a novelstatistical approach, the Zero-inﬂation Bias Correction (ZIBC) method, that can ac-count for the bias introduced when using the Poisson regression model despite a highrate of zeros in the outcome distribution for randomized clinical trials. This correctionmethod utilizes summary information from individual studies to correct interventioneﬀect estimates as if they were appropriately estimated in zero-inﬂated Poisson regres-sion models. Simulation studies and real data analyses show that the ZIBC method hasgood performance in correcting zero-inﬂation bias in many situations. This methodprovides a methodological solution in improving the accuracy of meta-analysis results,which is important to evidence-based medicine.

Correspondence should be sent to: Zhengyang Zhou, Ph.D. (Email: [email protected]) andEun-Young Mun, Ph.D. (Email: [email protected]) a r X i v : . [ s t a t . A P ] O c t eywords: aggregate data, meta-analysis, randomized clinical trial, zero-inﬂated outcome Meta-analysis is an established statistical approach for combining data from multiple studiesto provide large-scale evidence across many disciplines, including medical, educational, andpolicy research (Schmid et al. 2020). The majority of published meta-analyses have reliedon aggregate data (AD), which are study-level summary statistics available from publishedor unpublished reports (Sutton & Higgins 2008, Lyman & Kuderer 2005, Chen et al. 2020).However, AD meta-analysis is susceptible to estimation bias, because the biased result froma study with model misspeciﬁcation (e.g., a biased eﬀect size) will be carried over in meta-analysis if the study is included. For AD meta-analysis, it is challenging to correct biasedestimation from original studies without reﬁtting raw participant-level data using a moresuited statistical model (Liu & Chen 2018). In this paper, we aim to correct this estimationbias, i.e., the bias from the conventional count model on zero-inﬂated count outcome, whenonly AD are available for meta-analysis.Count outcomes are prevalent in clinical research, including number of seizures for eachpatient in epilepsy trials (e.g., Garcia et al. (2004)), number of relapses in multiple scle-rosis trials (e.g., Silcocks et al. (2010)), and number of standard alcohol drinks in alcoholintervention trials (e.g., Huh et al. (2019)). Some studies, by nature, have high proportionsof zero outcome values. For example, in alcohol reduction and prevention clinical trials forcollege students, many participants may be abstainers who do not drink, resulting in a largeproportion of zero standard drinks, above and beyond the frequency that would be predictedby conventional count models, such as Poisson. Therefore, estimation results would be bi-ased if the Poisson regression model was used in studies with zero-inﬂated outcomes, whichwould further bias the pooled result in a meta-analysis. We, henceforth, refer to this bias as zero-inﬂation bias throughout the study. 2 zero-inﬂated Poisson (ZIP) model is more appropriate for count data with many zeros,since it assumes the outcome follows a mixture of a point mass at zero and a Poissondistribution (Lambert 1992). From a clinical perspective, the two components of the ZIPmodel correspond to two distinct subpopulations: (a) participants who do not engage in thebehavior, and (b) participants that may or may not engage in the behavior at a particularassessment. In some clinical situations, clinicians may focus on the latter as they are theprimary target of their intervention and stand to beneﬁt more from it (See Section 2.1 for twoexamples). In this paper, we focus on the Poisson portion in the ZIP model, where we aimto correct the zero-inﬂation bias related to the Poisson mean parameter. Note that in othertrial evaluation situations, the overall mean of outcome, which accommodates structuralzeros, may be more important. In such situations, modeling the marginalized mean of theZIP model may be appropriate (Long et al. 2014).In this article, we focus on mitigating the impact of zero-inﬂation bias in meta-analysisand propose a novel statistical method, called a Zero-inﬂation Bias Correction (ZIBC)method. This method corrects the biased intervention eﬀect size estimation that can re-sult from the conventional Poisson regression model, the “go-to” method when modelingcount outcomes. We aim to correct zero-inﬂation bias and produce a bias-corrected eﬀectsize estimate equivalent to the estimate from the ZIP regression model. This bias correctionis achieved by comparing the estimating equations under the ZIP and Poisson models andusing summary statistics of intervention and control subgroups. We will refer to the Poissonand ZIP regression models as the conventional and true methods, respectively, in the currentpaper.Without correction for zero-inﬂation bias, the conventional method tends to overestimatethe intervention eﬀect and may produce false positive results. For example, in the secondreal data example (Section 4.2), the original study used a Poisson model to evaluate theeﬀect of a modiﬁed toothbrushing program on preventing dental caries, which concludeda statistically signiﬁcant eﬀect (Fraz˜ao 2011). However, the outcome of dental caries had3ore than 60% of zero values, and after applying the ZIBC method to adjust for the zero-inﬂation, the eﬀect became insigniﬁcant. If such studies were included in a meta-analysis,the overall result might be overestimated. Thus, it is important to consider zero-inﬂationbias and correct it in the application of meta-analysis.The paper proceeds as follows. In Section 2, we describe the formulation of the standardPoisson and ZIP regression models for a single study. We then introduce the ZIBC methodfor correcting zero-inﬂation bias as well as how to apply it in an AD meta-analysis. InSection 3, we conduct simulation studies to evaluate the performance of the ZIBC methodin bias correction. In Section 4, we apply the ZIBC method to two real data examples. InSection 5, we conclude the article and provide discussions.

In this section, we describe the ZIBC method and how it corrects zero-inﬂation bias in anAD meta-analysis. We ﬁrst focus on the case of single randomized clinical trial study, wherewe set up notations for the true and conventional methods (Section 2.1). We then describezero-inﬂation bias (Section 2.2), and provide the ZIBC method that can correct it (Section2.3). Next, for each clinical trial study that originally used the conventional method for zero-inﬂated outcomes, we implement the ZIBC method to obtain the bias-corrected interventioneﬀect estimate and combine data in a standard meta-analysis for the overall bias-correctedintervention eﬀect (Section 2.4).

For a randomized clinical trial with two arms, we assume a count outcome with an excessiverate of zeros that follows a ZIP regression model. Suppose the study sample size is n , and4or i -th subject, i = 1 , , ..., n , and we assume that the outcome y i is distributed y i ∼  π i Poisson( µ i ) with probability 1 − π i , (1)where π i is the structural zero rate and µ i is the mean parameter of the Poisson portion forsubject i .In the context of clinical trials, the structural zeros correspond to participants that donot engage in the outcome (e.g., alcohol abstainers who do not drink across situation andtime), whereas the Poisson portion corresponds to those who may or may not engage in thebehavior at a given time or situation (e.g., participants who may or may not drink duringthe past month at 1-month follow-up). The present paper focuses on the Poisson portioncharacterizing the intervention eﬀect on the latter, which is of interest in many clinical trials.For example, in alcohol prevention and reduction trials among college students, researchersmay be most interested in students who may drink if given an opportunity (e.g., Section 4.1).Another example can be clinical trials for preventing dental caries among children, wherethe outcome of interest is number of caries developed during a certain period (e.g., Section4.2). Among the trials, some children may be very unlikely to develop dental caries (e.g., dueto good oral hygiene habits or protective genetic factors), while others have higher chancesto develop any. Therefore, targeting the latter group of children, that can be characterizedthrough the Poisson portion, would produce higher cost-eﬀectiveness and utility for dentalcaries prevention strategies.The Poisson portion can be modeled as follows. Suppose p − { A i = T } , where A i denotes a participant’s assignment to either the intervention ( T ) or control ( C ) arm,and x i,p − = ( x i , x i , ..., x i,p − ) t denotes the remaining p − µ i ) = x ti β = β + β { A i = T } + x ti,p − η , (2)5here x i = (1 , { A i = T } , x ti,p − ) t and β = ( β , β , η t ) t = ( β , β , β , ..., β p − ) t are the regressioncoeﬃcients. Note that β measures the intervention eﬀect on the mean parameter in thePoisson portion, which is the parameter we aim to recover. We denote β = ( β , β , η ,t ) t as the true regression parameters.From Equations (1) and (2), the estimating equations under the true method is given by S ZIP ( β ) (cid:44) n (cid:88) y i =0 (cid:20) π i π i + (1 − π i ) exp {− exp( x ti β ) } (cid:21) exp( x ti β ) x i + 1 n n (cid:88) i =1 {− exp( x ti β ) + y i } x i . By solving S ZIP ( β ) = 0, we can obtain the maximum likelihood estimates (MLE), ˆ β MLE .As β are the true parameters for the ZIP model (1), we also have E [ S ZIP ( β )] = 0 andˆ β MLE → β as n → ∞ by standard likelihood inference. Note that π i can be modeledseparately in a logistic model. However, we do not attempt to model π i because it is not theinterest of the current study. For the same clinical trial design described in the previous subsection, many researchers (cf.,Fraz˜ao (2011), Murphy et al. (2000), etc.) have used the conventional (CV) Poisson modelto analyze zero-inﬂated count outcome y i with f CV ( y i | µ i ) = e − µi µ yii y i ! , where log( µ i ) = x ti β .Under the conventional method, we derive the following estimating equations S CV ( β ) (cid:44) n n (cid:88) i =1 {− exp( x ti β ) + y i } x i (3)and denote ˆ β CV as the solution of S CV ( β ) = 0. Then ˆ β CV are the parameter estimates in theconventional method, which are usually reported in each individual trial. Deﬁne β ∗ as thesolution of E [ S CV ( β )] = 0. By the standard asymptotic theory of M-estimation (cf., Serﬂing(2009)), we can show that ˆ β CV → β ∗ , as n → ∞ . Since the estimating equations do notaccount for zero-inﬂation, there is a discrepancy between β ∗ and the true parameter values β , so the intervention eﬀect estimate from the conventional method, ˆ β , CV , is biased. In6he current study, we focus on the MLE of the true intervention eﬀect, ˆ β , MLE , which can berecovered by modifying ˆ β , CV . In this section, we formally describe zero-inﬂation bias as the diﬀerence between the param-eters of the true method (i.e., β ) and those of the conventional method (i.e., β ∗ ). Denote δ as zero-inﬂation bias for all parameters, then δ = β − β ∗ , and ˆ β MLE ≈ ˆ β CV + δ . Since ˆ β , MLE is of primary interest, we focus on the corresponding zero-inﬂation bias for the interventioneﬀect δ and the following formula ˆ β ≈ ˆ β + δ .We can characterize δ by taking a close look at the equations E [ S CV ( β ∗ )] = 0. Equation(3) can be recast as:0 = E [ S CV ( β ∗ )] = 1 n n (cid:88) i =1 (cid:2) (1 − π i ) exp { x ti ( β − β ∗ ) } − (cid:3) exp( x ti β ∗ ) x i = 1 n n (cid:88) i =1 { (1 − π i ) exp( x ti δ ) − } exp( x ti β ∗ ) x i (cid:44) B ( δ ) , (4)which shows zero-inﬂation bias δ is the solution of B ( δ ) = 0. However, x i and π i requireparticipant-level information, which is unavailable in AD meta-analysis. Hence, Equation(4) cannot be solved directly. Alternatively, we can approximate B ( δ ) by substituting x i and π i with study-level summary information, and then solve for the approximated B ( δ ).We describe the approximation in detail in the following section.7 .3 Approximate bias δ : The ZIBC method In this section, we describe the ZIBC method to approximate δ using Equation (4). First,we can simplify B ( δ ) by B ( δ ) = 1 n n (cid:88) i =1 { (1 − π i ) exp( x ti δ ) − } exp( x ti β ∗ ) x i ≈ n { (1 − ¯ π ) exp( ¯x t δ ) − } n (cid:88) i =1 exp( x ti β ∗ ) x i (cid:44) B ( δ ) approx , (5)where ¯ π = n (cid:80) ni =1 π i is the average structural zero rate and ¯ x are the average values forcovariates in the sample. Thus, part of the participant-level information (i.e., x i and π i )are substituted with the study-level summary statistics (i.e., ¯ x and ¯ π ) to approximate B ( δ ).Rewrite ¯x = (1 , ¯z t ) t , where ¯z = ( (cid:80) ni =1 { A i = T } /n, ¯x tp − ) t , and δ = ( δ , δ tp − ) t , then Equation(5) becomes B ( δ ) approx = n { (1 − ¯ π ) exp( δ + ¯z t δ p − ) − } (cid:80) ni =1 y i x i , and a solution for B ( δ ) approx = 0 is ˆ δ approx =  ˆ δ , approx ˆ δ p − , approx  =  − log(1 − ¯ π )  . (6)Thus, the MLE of the true intercept can be recovered byˆ β , MLE ≈ ˆ β , CV + ˆ δ , approx = ˆ β , CV − log(1 − ¯ π ) . (7)However, ˆ β , MLE can not be obtained directly as ˆ δ , approx = 0 in Equation (6). To get aroundthis limitation, we can estimate ˆ β , MLE by estimating the MLE of the intercept separately forthe control and intervention groups, based on Equation (7). The speciﬁc steps are describedas follows:1. Consider the sample as being comprised of two separate and independent groups:Intervention and Control.2. For each group, derive a bias-corrected intercept from the conventional method usingEquation (7). 8. Merge the corrected intercepts of the two groups from step 2 to obtain the correctedintervention eﬀect estimate. The details are given as follows.Denote C = { i | A i = C, i = 1 , , ..., n } and T = { i | A i = T, i = 1 , , ..., n } as the indexsets for control and intervention groups, respectively. We further denote | C | = n C and | T | = n T . We ﬁrst consider control group. Since { A i = T } = 0 for i ∈ C , Equation (2)becomes log( µ i ) = β + x ti,p − η . Denote ˆ β C, MLE and ˆ β C, CV as the parameter estimates underthe true and conventional methods, respectively. Based on Equation (7), we haveˆ β ,C, MLE ≈ ˆ β ,C, CV − log(1 − ¯ π C ) , (8)where ¯ π C = n C (cid:80) i ∈ C π i is the average structural zero rate in control group.We then consider intervention group. Since { A i = T } = 1 for i ∈ T , Equation (2) becomeslog( µ i ) = β + β + x ti,p − η . Note that the intercept becomes ( β + β ), which includes theintervention eﬀect. Under similar arguments and notations, we then have (cid:92) ( β + β ) T, MLE ≈ (cid:92) ( β + β ) T, CV − log(1 − ¯ π T ) , (9)where ˆ β T, CV is the parameter estimate from the conventional method and ¯ π T = n T (cid:80) i ∈ T π i is the average structural zero rate in intervention group. We introduce the following Lemma1 to estimate ˆ β , MLE by transforming Equations (8) and (9). The proof is given in AppendixA.

Lemma 1.

In a study given by Equations (1) and (2), denote the observed covariatesexcluding the intervention assignment as x i,p − = ( x i , x i , ..., x i,p − ) t for i = 1 , ..., n . If ¯ x C,p − = ¯ x T,p − , where ¯ x C,p − = n C (cid:80) i ∈ C x i,p − and ¯ x T,p − = n T (cid:80) i ∈ T x i,p − , then we have ˆ β , MLE ≈ ˆ β , CV − log(1 − ¯ π T ) + log(1 − ¯ π C ) . (10)Lemma 1 gives the correction formula, Equation (10), of the proposed ZIBC method. Theassumption ¯ x C,p − = ¯ x T,p − requires that the “average” subject in control group has the same9ovariate values as the “average” subject in intervention group. In a randomized controlledtrial, subjects are randomized to either a control or intervention group, thus the covariatesshould follow similar distributions across the groups. In addition, the participants in controland intervention groups are expected to be equivalent not only in all measured covariates butalso in other unmeasured ones. Hence, the assumption of Lemma 1 can reasonably hold inthis case. Note that the correction depends on the relative diﬀerence between the structuralzero rates of two groups. If ¯ π C < ¯ π T , then ˆ β , CV is likely to be smaller than ˆ β , MLE , and viceversa, suggesting that the conventional method tends to overestimate the intervention eﬀectin a typical clinical trial.The group-level structural zero rates ¯ π C and ¯ π T can be estimated using the followingalgorithm. Take the control group, for i ∈ C , as an example, we have  E [¯ y ] = n C (cid:80) i ∈ C E ( y i ) = n C (cid:80) i ∈ C (1 − π i ) µ i ≈ ¯ y obs ,C , E [ y i = 0] = (cid:80) i ∈ C P ( y i = 0) = (cid:80) i ∈ C { π i + (1 − π i ) e − µ i } ≈ n , obs ,C , (11)where n C , ¯ y obs ,C , and n , obs ,C are the sample size, observed outcome average, and observednumber of zero outcomes, respectively, for the control group. To estimate ¯ π C , we approximateEquation (11) by substituting π i with ¯ π C , and µ i with ¯ µ C = n C (cid:80) i ∈ C µ i , resulting in  (1 − ¯ π C )¯ µ C ≈ ¯ y obs ,C , { ¯ π C + (1 − ¯ π C ) e − ¯ µ C } ≈ n , obs ,C /n C . (12)Here, n , obs ,C /n C is the proportion of zero outcome values in control group. By solvingEquation (12), we can get an approximation of ¯ π C . Similarly, we can get ¯ π T using the sameprocess.The data required for the ZIBC method are (a) ˆ β , CV , (b) ¯ y obs ,C , ¯ y obs ,T , and (c) n , obs ,C /n C , n , obs ,T /n T . In a typical trial study, (a) and (b) are directly reported or can be obtained, while(c) are less frequently reported but may be obtained via author queries to the investigatorsof original studies. 10 .4 Implementation in meta-analysis Suppose an AD meta-analysis contains K studies that used the conventional method tomodel the zero-inﬂated outcomes. For study s , s ∈ { , , ..., K } , we can implement theZIBC method to obtain the bias-corrected intervention eﬀect ˆ β s, corrected . For simplicity, weuse the reported standard errors (cid:99) SE s, CV from the conventional method. With the new setof intervention eﬀects and standard errors, standard AD meta-analysis can be applied tocombine results across studies and obtain the corrected overall intervention eﬀect estimate. We conducted simulation studies to examine the performance of the ZIBC method. Speciﬁ-cally, we compare relative performance of the following three methods:1. ZIP regression model (i.e., the true method), the “gold standard” method, which isnot feasible in AD meta-analysis,2. Poisson regression model (i.e., the conventional method), the method with zero-inﬂationbias when the outcome is zero-inﬂated, and3. ZIBC method, the method to correct zero-inﬂation bias from the conventional methodand recover the intervention eﬀect as if it came from the true method.In the simulation study, we consider K = 10 randomized clinical trials aimed at evaluatingthe eﬀect of an intervention on reducing alcohol consumption, where the outcome is thenumber of alcoholic drinks. For each trial, we consider a balanced random assignmentof participants to either intervention or control group. We also incorporate an additionalcovariate that follows the standard normal distribution. The simulation was motivated byProject INTEGRATE, a large-scale meta-analysis project examining the eﬀectiveness of briefmotivational interventions on reducing alcohol consumption among young adults (Mun et al.11015). High proportions of zero alcoholic drinks (i.e., non-drinking) were observed in mosttrials included in the study.The settings of the simulation are based on our observation of the motivating data.Speciﬁcally, the sample sizes for individual trials are set at 200 and 400 for studies 1-5 and6-10, respectively. For study s ∈ { , , ..., K } with sample size n s , the outcome of i -thsubject ( i ∈ { , , ..., n s } ) is simulated by a true ZIP regression model y si ∼ Poisson( µ si )with probability 1 − π si , and 0 otherwise. The structural zero rate π si and Poisson meanparameter µ si are simulated by logit( π si ) = γ + γ { A si = T } + γ Cov si and log( µ si ) = β + β { A si = T } + β Cov si with a continuous covariate Cov si ∼ N (0 ,

1) and intervention groupassignment { A si = T } ∼ Bernoulli(0 . β , β , β ) = (1 . , − . , . β , β , β ) = (1 . , − . , . β , β , β ) = (0 . , − . , . β ) varies from − . − .

2, the intercept ( β ) alsovaries accordingly to ﬁx the maximum possible log( µ si ) at the same level of 0.95.To evaluate the impact of diﬀerent degrees of zero-inﬂation on the bias and performanceof the methods, we varied the overall proportion of zero alcoholic drinks at 0.25, 0.30 and0.35 among trials. Then γ , γ and γ can be calculated to yield the aforementioned zerorates. In the simulation, we ﬁxed γ = 0 .

5, indicating that participants in the interventiongroup will have a higher probability of no drinking, compared to the control. For example,more participants who previously drank may quit drinking after intervention, compared withtheir control counterparts. To achieve identiﬁability in estimating the parameters, we addedone additional constraint: γ = γ . We also tested diﬀerent constraints in the simulationstudy, which yielded similar results. This suggests that the simulation results reported inthe current study are robust regardless of the choice of constraints (results not shown but12vailable upon request).In one replication of the simulation, data from 10 clinical trials were generated. For eachstudy, both the true and conventional methods were estimated ﬁrst, then the ZIBC methodwas applied to modify the intervention eﬀect estimate from the conventional method. Finally,for each of the three methods, we applied random-eﬀects meta-analysis using the metafor R package (Viechtbauer 2010), and generated forest plots to compare performance betweenthe methods.Figure 1 shows a forest plot from a typical replication during simulation when the trueintervention eﬀect β = − . .

35. Based on the results, we havethe following four observations. First, the conventional method produced biased estimatesof intervention eﬀects for individual studies as well as the overall result after meta-analysis.Speciﬁcally, it overestimated the magnitude of the overall intervention eﬀects ( − .

36 vs. − . β = − .

2, for each study, as well as the over-all eﬀect across studies. Third, the ZIBC method corrected zero-inﬂation bias to the rightdirection for each study. Finally, after meta-analysis, the corrected overall estimate fromthe ZIBC method was very close to the true parameter value of − .

2. In sum, this typi-cal simulation replication illustrates that the ZIBC method reasonably corrects the biasedintervention eﬀect estimates from the conventional method.Figure 1 graphically illustrates the good performance of the ZIBC method in a singlesimulation replication. To examine the performance numerically across replications, wecompared the intervention eﬀect estimates from the three methods with the true interventioneﬀect β by calculating the coverage indicator (1 if the 95% conﬁdence interval covers β and 0otherwise) and diﬀerences with β at each replication. After 1000 replications, we calculatedthe proportion of replications whose 95% conﬁdence intervals captured β (coverage rate),13nd the mean squared error (MSE) between the eﬀect estimate and β . We used both ofthese indices to compare performance across the methods.Figure 2 presents the results for diﬀerent simulation settings. It shows that ﬁrst, the truemethod had the highest coverage rates, which were close to 0.95, and also had MSE valuesclose to 0. Second, the conventional method resulted in biased intervention eﬀect estimates,as indicated by low coverage rates and high MSE values. Note that as zero rates got higher,zero-inﬂation bias became greater, leading to progressively lower coverage rates and higherMSE values. Third, the ZIBC method had acceptable coverage rates close to 0.9 and lowMSE that were close to 0. Furthermore, the performance of the ZIBC method was consistentacross diﬀerent zero rates. Based on the comparative results shown in Figures 1 and 2, weconclude that the ZIBC method provides reasonable correction for zero-inﬂation bias of theintervention eﬀect from the conventional method in AD meta-analysis. In this section, we apply the ZIBC method to two real data examples. In Section 4.1,we demonstrate the performance of the proposed method using individual participant data(IPD) from Project INTEGRATE. In Section 4.2, we illustrate the application of the ZIBCmethod to a randomized controlled trial on preventing dental caries in the ﬁeld of oral health.

Project INTEGRATE is a large-scale IPD meta-analysis study examining the overall eﬃcacyand comparative eﬀectiveness of brief alcohol interventions for young adults (Mun et al.2015). A recent IPD meta-analysis of 6,713 participants from 17 randomized controlled trialsexamined the eﬀect of intervention on the total number of drinks consumed in a typical week,a count variable with a high percentage of zeros (Huh et al. 2015). Across all studies, anaverage of 30% of individuals reported zero drinking, with the highest proportion of zero14rinking being 66% in one study.We applied the ZIBC method to the tutorial data in Huh et al. (2019) to evaluate itsperformance. Clinical trials included in the current study (a) randomly allocated participantsto an intervention or control group, (b) had a follow-up within 6 months from baseline, and(c) had at least one zero outcome in a study. Ten of the 17 studies met the criteria (studies2, 7 (7.1 and 7.2), 9, 11, 14, 15, 16, 18 and 21). For more details of the studies, please referto Mun et al. (2015), and Huh et al. (2015, 2019). The outcome was the average drinks on atypical drinking day in the most recent follow-up assessment within 6 months, with a ﬁxedassessment time for each study. We included the intervention group assignment as the onlycovariate.For data analysis, we followed the same steps as in simulation and present the comparativeresults in a forest plot (Figure 3). For most studies, the conventional method produced biasedintervention eﬀect estimates compared with the true method. Note that the conventionalmethod overestimated the true eﬀect in some studies and underestimated it in others. Thisis because the average structural zero rates in intervention groups (see ¯ π T in Figure 3) werehigher than the corresponding control group in studies 9, 14, and 16 (see ¯ π C in Figure 3),so that the conventional method overestimated the intervention eﬀects in these studies. Incontrast, in other studies where the average proportion of zeros was higher in the control,compared with intervention group, the direction of zero-inﬂation bias reversed. The oppositedirections of zero-inﬂation bias may be partly attributable to small intervention eﬀects acrossthe studies. In such data situations, small variations can inﬂuence the direction of an eﬀect.The data example demonstrates that the ZIBC method corrects zero-inﬂation bias re-gardless of the directions of the bias in the meta-analysis. In conclusion, the ZIBC methodshowed good performance in correcting zero-inﬂation bias for individual studies as well asfor combining such data in meta-analysis. 15 .2 Analysis 2: A dental caries prevention clinical trial Fraz˜ao (2011) conducted a randomized controlled trial to evaluate whether the bucco-lingualtechnique could increase the eﬀectiveness of a toothbrushing program on preventing dentalcaries (i.e., cavities) among ﬁve-year-old children. This study was a two-arm trial thatrandomized participants to either a conventional tooth brushing program (Control) or amodiﬁed tooth brushing program (Intervention). The outcome of interest was the number ofenamel and dentin caries at 18-month follow up, which exhibited considerable zero-inﬂation,with rates up to 67%. The conventional Poisson regression model was used to evaluate theintervention eﬀect in the original study. The analysis was stratiﬁed by gender due to baselineimbalance in covariates. Since a high proportion of participants did not develop any dentalcaries, zero-inﬂation bias would be expected in the intervention eﬀect estimates from theconventional method.We apply the proposed ZIBC method here in order to correct for zero-inﬂation bias.First, we extracted the required information from the original study (see Table 1). Speciﬁ-cally, the uncorrected eﬀects (i.e., ˆ β , CV and (cid:99) SE , CV ) were calculated from incidence densityratios (IDR) and 95% conﬁdence intervals in the original Table 3 from Fraz˜ao (2011), and thearm-level outcome averages and the proportion of zeros (i.e., ¯ y obs ,C , ¯ y obs ,T , n , obs ,C /n C and n , obs ,T /n T ) were obtained directly from the original Figure 2 by using software WebPlotDig-itizer version 4.2 (Rohatgi 2019). We then estimated the arm-level average structural zerorates ¯ π C and ¯ π T by solving Equation (12), which were 49% and 32%, respectively, for girlsand 27% and 45%, respectively, for boys. Finally, we obtained the corrected interventioneﬀect estimates ˆ β , MLE by plugging the values of ˆ β , CV , ¯ π C and ¯ π T into Equation (10). Usingthe original standard errors (cid:99) SE , CV , we obtained the modiﬁed p-values based on the Waldtest.The original and ZIBC method-corrected results are summarized in Table 2. For girls,in the original analysis, although the intervention had an insigniﬁcant eﬀect, girls in the16odiﬁed toothbrushing program tended to develop more caries with an IDR of 1 .

34, repre-senting an even worse intervention eﬀect. After applying the ZIBC method to adjust for thezero-inﬂation, the IDR was corrected to 1 .

01, indicating a null intervention eﬀect and in linewith the expectation that the interventions would not be harmful. For boys, the originalanalysis showed a signiﬁcant intervention eﬀect with an IDR of − .

74 and P-value of 0 . − . .

13. It is worth noting that originally the conventional Poisson regressionmodel yielded a statistically signiﬁcant intervention eﬀect, but after the adjustment, it be-came insigniﬁcant. This example illustrates that without the correction for zero-inﬂationbias, the Poisson regression model overestimated the intervention eﬀect and produced a falsepositive result.

This paper focuses on identifying and addressing bias that can arise in AD meta-analysesof zero-inﬂated count outcomes, which are prevalent in health-related research and manyother areas. When an outcome variable with an excessive rate of zeros is modeled withoutaccounting for zero inﬂation, zero-inﬂation bias will result. Further, when those studies arecombined in the case of a meta-analysis, the bias will be carried over into overall estimatesacross studies, which can lead to invalid inferences. For example, our simulation studyfound that for a two-arm randomized controlled trial design, the conﬁdence interval of theintervention eﬀect estimate from the conventional method included the true value less than40% and 10% of the time when the zero rate was 25% and 35%, respectively. When theIPD is available for a study, zero-inﬂation bias can be corrected by re-analyzing the datawith an appropriate zero-inﬂated model, such as the ZIP regression model. However, itis usually not feasible to acquire IPD from original studies. Thus, AD meta-analysis ismore commonly used with biased estimates from zero-inﬂated outcomes. The current study17roposes a method to correct biased estimates in such situations.As demonstrated through simulation and the real data example from Project INTE-GRATE, the ZIBC method can provide a methodological solution for obtaining more accu-rate estimates of intervention eﬀects in an AD meta-analysis. The ZIBC method speciﬁcallyworks well when one can use the information of the “average” subject in the sample to ap-proximate the study result, as we substituted some IPD required in the estimating equations B ( δ ) (i.e., x i and π i ) with their study-level average values (i.e., ¯ x and ¯ π ), to relax IPDrequirement. This modiﬁcation is in line with the idea of Mean Value Theorem for Inte-grals. The adjustment of the biased intervention eﬀect estimate is obtained by combiningthe results of participants in the control and intervention groups. The statistical propertyof the ZIBC method is justiﬁed by Lemma 1, which is based on the assumption that thecharacteristics (or covariates) of “average” subjects in two groups are similar, which shouldhold in randomized controlled trials due to random assignment to groups. In other situa-tions where the requirement is not met, such as case-control or cross-sectional studies, theZIBC method should be used with caution. In addition, by imposing linear predictors in thetrue ZIP regression model (i.e., Equation (2)), we also implicitly assume no intervention bycovariate interactions on the outcome, which should hold in most trials.The second data example illustrates the application of the proposed ZIBC method todata from a randomized controlled trial, where all required information could be obtainedfrom the original study report. Using the conventional Poisson regression model, the studyreported a statistically signiﬁcant intervention eﬀect in reducing dental caries. However, afterapplying the ZIBC method, the eﬀect was reduced in its magnitude and became statisticallyinsigniﬁcant. Without adjusting for excessive zeros, the Poisson regression model tends tooverestimate the intervention eﬀect, which can lead to false positive results.After correcting zero-inﬂation bias for each individual trial, the modiﬁed interventioneﬀects are combined in AD meta-analysis to obtain a more accurate overall result. Notethat the ZIBC method only targets the mean intervention eﬀect estimate, and researchers18an borrow standard errors from the original study using the conventional Poisson regressionmodel to conduct AD meta-analysis. Since the conventional method tend to underestimatethe standard errors when the outcome is zero-inﬂated (B¨ohning et al. 1999, Dobbie & Welsh2001), this practice may slightly overestimate the statistical signiﬁcance, and other variancecorrection methods for Poisson distribution may be applied. Although the magnitude of thisoverestimation is expected to be insigniﬁcant, this requires further investigation. Further-more, the use of other variance correction methods for Poisson distribution may be exploredin future studies.The ZIBC method minimally requires new information for its correction. Most of theinformation required by the ZIBC method can be obtained from study reports, such as inter-vention eﬀect estimates from the conventional Poisson method and group- or arm-level out-come averages. It also requires the group-level outcome zero rates, which are less commonlyprovided in study reports but can be obtained through inquiries with original investigators,or an educated guess when prior information or expert knowledge is available. Therefore,this article provides meta-analysts with a feasible method to correct zero-inﬂation bias inintervention eﬀect estimates when only AD are available.The ZIBC method we describe can be extended in the future in several ways. First,although we illustrate the ZIBC method in the context of a two-arm trial design, it can beapplied to multi-arm trials by comparing each intervention group with control and correctingthe biased intervention eﬀect per pair. Second, aside from the ZIBC method, alternativestrategies may be investigated for their feasibility and validity when adjusting the estimatingequations for zero-inﬂation bias (Equation (11)). One potential strategy is to generate pseudoIPD based on AD of outcome and each covariate, and then solve for δ using the pseudo data,which is similar to the idea of Approximate Bayesian Computing (see, e.g., Marin et al.(2012), Beaumont et al. (2002)). Finally, the proposed method is designed to recover biasedintervention eﬀect estimates from the conventional Poisson model when the ZIP regressionmodel should have been used, however, it can be extended to other statistical models with19ppropriate adjustments, such as a negative binomial regression model and a two-sample t -test, which can be thought of as a Wald test in a simple linear regression with interventiongroup membership as the lone covariate. Acknowledgements

Conﬂict of Interest : None declared.

Funding

This work was supported by National Institutes of Health grants (R01 AA019511) andNational Science Foundation grants (DMS1737857, 1812048, 2015373 and 2027855).

References

Beaumont, M. A., Zhang, W. & Balding, D. J. (2002), ‘Approximate bayesian computationin population genetics’,

Genetics (4), 2025–2035.B¨ohning, D., Dietz, E., Schlattmann, P., Mendonca, L. & Kirchner, U. (1999), ‘The zero-inﬂated poisson model and the decayed, missing and ﬁlled teeth index in dental epidemiol-ogy’,

Journal of the Royal Statistical Society: Series A (Statistics in Society) (2), 195–209.Chen, D.-G., Liu, D., Min, X. & Zhang, H. (2020), ‘Relative eﬃciency of using summaryversus individual data in random-eﬀects meta-analysis’,

Biometrics .Dobbie, M. J. & Welsh, A. H. (2001), ‘Theory & methods: Modelling correlated zero-inﬂatedcount data’,

Australian & New Zealand Journal of Statistics (4), 431–444.20raz˜ao, P. (2011), ‘Eﬀectiveness of the bucco-lingual technique within a school-based super-vised toothbrushing program on preventing caries: a randomized controlled trial’, BMCOral Health (1), 11.Garcia, H. H., Pretell, E. J., Gilman, R. H., Martinez, S. M., Moulton, L. H., Del Brutto,O. H., Herrera, G., Evans, C. A. & Gonzalez, A. E. (2004), ‘A trial of antiparasitictreatment to reduce the rate of seizures due to cerebral cysticercosis’, New England Journalof Medicine (3), 249–258.Huh, D., Mun, E.-Y., Larimer, M. E., White, H. R., Ray, A. E., Rhew, I. C., Kim, S.-Y., Jiao,Y. & Atkins, D. C. (2015), ‘Brief motivational interventions for college student drinkingmay not be as powerful as we think: An individual participant-level data meta-analysis’,

Alcoholism: Clinical and Experimental Research (5), 919–931.Huh, D., Mun, E.-Y., Walters, S. T., Zhou, Z. & Atkins, D. C. (2019), ‘A tutorial onindividual participant data meta-analysis using bayesian multilevel modeling to estimatealcohol intervention eﬀects across heterogeneous studies’, Addictive Behaviors , 162–170.Lambert, D. (1992), ‘Zero-inﬂated poisson regression, with an application to defects in man-ufacturing’, Technometrics (1), 1–14.Liu, Y. & Chen, Y. (2018), Avenues for further research, in ‘Diagnostic Meta-Analysis’,Springer, pp. 305–315.Long, D. L., Preisser, J. S., Herring, A. H. & Golin, C. E. (2014), ‘A marginalized zero-inﬂated poisson regression model with overall exposure eﬀects’, Statistics in medicine (29), 5151–5165.Lyman, G. H. & Kuderer, N. M. (2005), ‘The strengths and limitations of meta-analysesbased on aggregate data’, BMC Medical Research Methodology (1), 14.21arin, J.-M., Pudlo, P., Robert, C. P. & Ryder, R. J. (2012), ‘Approximate bayesian com-putational methods’, Statistics and Computing (6), 1167–1180.Mun, E.-Y., de la Torre, J., Atkins, D. C., White, H. R., Ray, A. E., Kim, S.-Y., Jiao,Y., Clarke, N., Huo, Y., Larimer, M. E. & Huh, D. (2015), ‘Project INTEGRATE: Anintegrative study of brief alcohol interventions for college students’, Psychology of AddictiveBehaviors (1), 34–48.Murphy, S. W., Foley, R. N., Barrett, B. J., Kent, G. M., Morgan, J., Barr´e, P., Campbell,P., Fine, A., Goldstein, M. B., Handa, S. P. et al. (2000), ‘Comparative hospitalization ofhemodialysis and peritoneal dialysis patients in canada’, Kidney international (6), 2557–2563.Rohatgi, A. (2019), ‘Webplotdigitizer version: 4.2’, https://automeris.io/WebPlotDigitizer .Schmid, C. H., Stijnen, T. & White, I. (2020), Handbook of Meta-Analysis , CRC Press.Serﬂing, R. J. (2009),

Approximation theorems of mathematical statistics , Vol. 162, JohnWiley & Sons.Silcocks, P., Whitham, D. & Whitehouse, W. P. (2010), ‘P3mc: A double blind parallel grouprandomised placebo controlled trial of propranolol and pizotifen in preventing migraine inchildren’,

Trials (1), 71.Sutton, A. J. & Higgins, J. P. (2008), ‘Recent developments in meta-analysis’, Statistics inMedicine (5), 625–650.Viechtbauer, W. (2010), ‘Conducting meta-analyses in r with the metafor package’, Journalof Statistical Software (3), 1–48. 22 ppendix A. Proof of Lemma 1 Proof.

Consider three “average” subjects in the control group, intervention group, and theoverall sample (denoted as { average , C } , { average , T } , and { average } ), with x average ,C,p − =¯ x C,p − , x average ,T,p − = ¯ x T,p − , and x average ,p − = ¯ x p − , respectively. Without loss of generality,assuming that observed covariates excluding the intervention assignment are grand meancentered before data analysis, we have x average ,p − = ¯ x p − = . Since ¯ x C,p − = ¯ x T,p − , wealso have x average ,C,p − = x average ,T,p − = x average ,p − = . Therefore, we havelog( µ average ,C ) = β ,C log( µ average ,T ) = β ,T + β ,T log( µ average ) = β + β { A average = T } under the true method. If an average subject in the overall sample belongs to the controlgroup, then log( µ average ) = log( µ average ,C ) ⇒ β ,C = β ⇒ ˆ β ,C, MLE ≈ ˆ β ,MLE . (A.1)Similarly, if an average subject belongs to the intervention group, thenlog( µ average ) = log( µ average ,T ) ⇒ β ,T + β ,T = β + β ⇒ (cid:92) ( β + β ) T, MLE ≈ ˆ β ,MLE + ˆ β ,MLE . (A.2)Under similar arguments, for the conventional method, we haveˆ β ,C, CV ≈ ˆ β , CV (A.3)and (cid:92) ( β + β ) T, CV ≈ ˆ β ,CV + ˆ β ,CV . (A.4)23lug Equations (A.1) and (A.3) into Equation (8), and plug Equations (A.2) and (A.4) intoEquation (9), we have ˆ β , MLE ≈ ˆ β , CV − log(1 − ¯ π C )ˆ β , MLE + ˆ β , MLE ≈ ˆ β , CV + ˆ β , CV − log(1 − ¯ π T ) , (A.5)which directly gives Equation (10). 24able 1: Information extracted from Fraz˜ao (2011)Summary information Data source Girls Boysˆ β , CV Table 3 0.29 -0.73 (cid:99) SE , CV Table 3 0.28 0.30¯ y obs ,C Figure 2 (with WebPlotDigitizer) 0.83 1.04¯ y obs ,T Figure 2 (with WebPlotDigitizer) 1.06 0.49 n , obs ,C /n C Figure 2 (with WebPlotDigitizer) 59% 45% n , obs ,T /n T Figure 2 (with WebPlotDigitizer) 47% 67%25able 2: Original and ZIBC method-corrected intervention eﬀect estimates, incidence densityratios (IDRs) and P-values, for girls and boys, respectivelyEstimate IDR P-valueGirls Original 0.29 1.34 0.29Corrected 0.01 1.01 0.97Boys Original -0.73 0.48 0.02Corrected -0.46 0.63 0.1326 reatment (true value: −0.2) −0.8 −0.4 0 0.2Log rate ratio CV Method ZIBC MethodOverall: True Method CV Method ZIBC Method10: True Method CV Method ZIBC Method9: True Method CV Method ZIBC Method8: True Method CV Method ZIBC Method7: True Method CV Method ZIBC Method6: True Method CV Method ZIBC Method5: True Method CV Method ZIBC Method4: True Method CV Method ZIBC Method3: True Method CV Method ZIBC Method2: True Method CV Method ZIBC Method1: True Method −0.36 [−0.42, −0.30]−0.19 [−0.25, −0.14]−0.20 [−0.27, −0.14]−0.28 [−0.43, −0.14]−0.14 [−0.28, 0.01]−0.14 [−0.31, 0.02]−0.39 [−0.55, −0.23]−0.17 [−0.32, −0.01]−0.21 [−0.40, −0.03]−0.45 [−0.60, −0.29]−0.32 [−0.47, −0.16]−0.32 [−0.49, −0.14]−0.31 [−0.46, −0.16]−0.22 [−0.37, −0.06]−0.19 [−0.36, −0.01]−0.31 [−0.47, −0.16]−0.20 [−0.35, −0.05]−0.23 [−0.40, −0.05]−0.55 [−0.77, −0.32]−0.33 [−0.55, −0.10]−0.33 [−0.59, −0.07]−0.20 [−0.41, 0.01]−0.06 [−0.27, 0.15]−0.14 [−0.38, 0.09]−0.53 [−0.76, −0.31]−0.20 [−0.42, 0.03]−0.18 [−0.45, 0.10]−0.41 [−0.63, −0.20]−0.19 [−0.40, 0.02]−0.22 [−0.47, 0.02]−0.27 [−0.48, −0.06]−0.09 [−0.31, 0.12]−0.07 [−0.32, 0.17]

Figure 1: A typical forest plot for the true, ZIBC and conventional methods when β = − . l l . . . . . . b = , b = −0.5 Zero rate C o v e r age R a t e l l ll l l True MethodZIBC MethodCV Method . l l l . . . . . . b = , b = −0.35 Zero rate C o v e r age R a t e l l ll l l . l l l . . . . . . b = , b = −0.2 Zero rate C o v e r age R a t e l l ll l l . l l l . . . . b = , b = −0.5 Zero rate M SE l l ll l l l l l . . . . b = , b = −0.35 Zero rate M SE l l ll l l l l l . . . . b = , b = −0.2 Zero rate M SE l l ll l l Figure 2: Coverage rates and MSE values of the true (blue dashed line), ZIBC (red dottedline) and conventional (black solid line) methods from 1000 replications28 roject INTEGRATE −0.6 −0.2 0 0.2 0.4Log rate ratio CV Method ZIBC MethodOverall: True Method CV Method ZIBC Method21: True Method CV Method ZIBC Method18: True Method CV Method ZIBC Method16: True Method CV Method ZIBC Method15: True Method CV Method ZIBC Method14: True Method CV Method ZIBC Method11: True Method CV Method ZIBC Method9: True Method CV Method ZIBC Method7.2: True Method CV Method ZIBC Method7.1: True Method CV Method ZIBC Method2: True Method −0.07 [−0.17, 0.03]−0.09 [−0.19, 0.01]−0.09 [−0.19, 0.01]−0.02 [−0.12, 0.07]−0.03 [−0.13, 0.07]−0.03 [−0.13, 0.07]−0.00 [−0.12, 0.11]−0.14 [−0.26, −0.03]−0.14 [−0.26, −0.03] 0.03 [−0.09, 0.14] 0.13 [ 0.02, 0.24] 0.13 [ 0.01, 0.24]−0.23 [−0.36, −0.10]−0.37 [−0.50, −0.24]−0.37 [−0.50, −0.24]−0.10 [−0.23, 0.02]−0.07 [−0.20, 0.05]−0.07 [−0.20, 0.05]−0.11 [−0.26, 0.03]−0.12 [−0.26, 0.02]−0.12 [−0.26, 0.02]−0.27 [−0.38, −0.17]−0.17 [−0.27, −0.06]−0.17 [−0.27, −0.06] 0.24 [ 0.16, 0.32] 0.16 [ 0.08, 0.25] 0.16 [ 0.08, 0.25]−0.04 [−0.17, 0.09]−0.09 [−0.23, 0.04]−0.09 [−0.23, 0.04]−0.22 [−0.36, −0.07]−0.26 [−0.41, −0.11]−0.26 [−0.41, −0.11] p C = 0.10; p T = 0.10 p C = 0.39; p T = 0.30 p C = 0.34; p T = 0.40 p C = 0.41; p T = 0.32 p C = 0.00; p T = 0.03 p C = 0.67; p T = 0.67 p C = 0.07; p T = 0.17 p C = 0.18; p T = 0.12 p C = 0.13; p T = 0.08 p C = 0.29; p T = 0.25= 0.25