A Bias Correction Method in Meta-analysis of Randomized Clinical Trials with no Adjustments for Zero-inflated Outcomes
AA Bias Correction Method in Meta-analysis of RandomizedClinical Trials with no Adjustments for Zero-inflated Outcomes
Zhengyang Zhou, Ph.D.University of North Texas Health Science Center, Fort Worth, TXMinge Xie, Ph.D.Rutgers University, Piscataway, NJDavid Huh, Ph.D.University of Washington, Seattle, WAEun-Young Mun, Ph.D.University of North Texas Health Science Center, Fort Worth, TX
Summary
Many clinical endpoint measures, such as the number of standard drinks consumedper week or the number of days that patients stayed in the hospital, are count data withexcessive zeros. However, the zero-inflated nature of such outcomes is often ignoredin analyses, which leads to biased estimates and, consequently, a biased estimate ofthe overall intervention effect in a meta-analysis. The current study proposes a novelstatistical approach, the Zero-inflation Bias Correction (ZIBC) method, that can ac-count for the bias introduced when using the Poisson regression model despite a highrate of zeros in the outcome distribution for randomized clinical trials. This correctionmethod utilizes summary information from individual studies to correct interventioneffect estimates as if they were appropriately estimated in zero-inflated Poisson regres-sion models. Simulation studies and real data analyses show that the ZIBC method hasgood performance in correcting zero-inflation bias in many situations. This methodprovides a methodological solution in improving the accuracy of meta-analysis results,which is important to evidence-based medicine.
Correspondence should be sent to: Zhengyang Zhou, Ph.D. (Email: [email protected]) andEun-Young Mun, Ph.D. (Email: [email protected]) a r X i v : . [ s t a t . A P ] O c t eywords: aggregate data, meta-analysis, randomized clinical trial, zero-inflated outcome Meta-analysis is an established statistical approach for combining data from multiple studiesto provide large-scale evidence across many disciplines, including medical, educational, andpolicy research (Schmid et al. 2020). The majority of published meta-analyses have reliedon aggregate data (AD), which are study-level summary statistics available from publishedor unpublished reports (Sutton & Higgins 2008, Lyman & Kuderer 2005, Chen et al. 2020).However, AD meta-analysis is susceptible to estimation bias, because the biased result froma study with model misspecification (e.g., a biased effect size) will be carried over in meta-analysis if the study is included. For AD meta-analysis, it is challenging to correct biasedestimation from original studies without refitting raw participant-level data using a moresuited statistical model (Liu & Chen 2018). In this paper, we aim to correct this estimationbias, i.e., the bias from the conventional count model on zero-inflated count outcome, whenonly AD are available for meta-analysis.Count outcomes are prevalent in clinical research, including number of seizures for eachpatient in epilepsy trials (e.g., Garcia et al. (2004)), number of relapses in multiple scle-rosis trials (e.g., Silcocks et al. (2010)), and number of standard alcohol drinks in alcoholintervention trials (e.g., Huh et al. (2019)). Some studies, by nature, have high proportionsof zero outcome values. For example, in alcohol reduction and prevention clinical trials forcollege students, many participants may be abstainers who do not drink, resulting in a largeproportion of zero standard drinks, above and beyond the frequency that would be predictedby conventional count models, such as Poisson. Therefore, estimation results would be bi-ased if the Poisson regression model was used in studies with zero-inflated outcomes, whichwould further bias the pooled result in a meta-analysis. We, henceforth, refer to this bias as zero-inflation bias throughout the study. 2 zero-inflated Poisson (ZIP) model is more appropriate for count data with many zeros,since it assumes the outcome follows a mixture of a point mass at zero and a Poissondistribution (Lambert 1992). From a clinical perspective, the two components of the ZIPmodel correspond to two distinct subpopulations: (a) participants who do not engage in thebehavior, and (b) participants that may or may not engage in the behavior at a particularassessment. In some clinical situations, clinicians may focus on the latter as they are theprimary target of their intervention and stand to benefit more from it (See Section 2.1 for twoexamples). In this paper, we focus on the Poisson portion in the ZIP model, where we aimto correct the zero-inflation bias related to the Poisson mean parameter. Note that in othertrial evaluation situations, the overall mean of outcome, which accommodates structuralzeros, may be more important. In such situations, modeling the marginalized mean of theZIP model may be appropriate (Long et al. 2014).In this article, we focus on mitigating the impact of zero-inflation bias in meta-analysisand propose a novel statistical method, called a Zero-inflation Bias Correction (ZIBC)method. This method corrects the biased intervention effect size estimation that can re-sult from the conventional Poisson regression model, the “go-to” method when modelingcount outcomes. We aim to correct zero-inflation bias and produce a bias-corrected effectsize estimate equivalent to the estimate from the ZIP regression model. This bias correctionis achieved by comparing the estimating equations under the ZIP and Poisson models andusing summary statistics of intervention and control subgroups. We will refer to the Poissonand ZIP regression models as the conventional and true methods, respectively, in the currentpaper.Without correction for zero-inflation bias, the conventional method tends to overestimatethe intervention effect and may produce false positive results. For example, in the secondreal data example (Section 4.2), the original study used a Poisson model to evaluate theeffect of a modified toothbrushing program on preventing dental caries, which concludeda statistically significant effect (Fraz˜ao 2011). However, the outcome of dental caries had3ore than 60% of zero values, and after applying the ZIBC method to adjust for the zero-inflation, the effect became insignificant. If such studies were included in a meta-analysis,the overall result might be overestimated. Thus, it is important to consider zero-inflationbias and correct it in the application of meta-analysis.The paper proceeds as follows. In Section 2, we describe the formulation of the standardPoisson and ZIP regression models for a single study. We then introduce the ZIBC methodfor correcting zero-inflation bias as well as how to apply it in an AD meta-analysis. InSection 3, we conduct simulation studies to evaluate the performance of the ZIBC methodin bias correction. In Section 4, we apply the ZIBC method to two real data examples. InSection 5, we conclude the article and provide discussions.
In this section, we describe the ZIBC method and how it corrects zero-inflation bias in anAD meta-analysis. We first focus on the case of single randomized clinical trial study, wherewe set up notations for the true and conventional methods (Section 2.1). We then describezero-inflation bias (Section 2.2), and provide the ZIBC method that can correct it (Section2.3). Next, for each clinical trial study that originally used the conventional method for zero-inflated outcomes, we implement the ZIBC method to obtain the bias-corrected interventioneffect estimate and combine data in a standard meta-analysis for the overall bias-correctedintervention effect (Section 2.4).
For a randomized clinical trial with two arms, we assume a count outcome with an excessiverate of zeros that follows a ZIP regression model. Suppose the study sample size is n , and4or i -th subject, i = 1 , , ..., n , and we assume that the outcome y i is distributed y i ∼ π i Poisson( µ i ) with probability 1 − π i , (1)where π i is the structural zero rate and µ i is the mean parameter of the Poisson portion forsubject i .In the context of clinical trials, the structural zeros correspond to participants that donot engage in the outcome (e.g., alcohol abstainers who do not drink across situation andtime), whereas the Poisson portion corresponds to those who may or may not engage in thebehavior at a given time or situation (e.g., participants who may or may not drink duringthe past month at 1-month follow-up). The present paper focuses on the Poisson portioncharacterizing the intervention effect on the latter, which is of interest in many clinical trials.For example, in alcohol prevention and reduction trials among college students, researchersmay be most interested in students who may drink if given an opportunity (e.g., Section 4.1).Another example can be clinical trials for preventing dental caries among children, wherethe outcome of interest is number of caries developed during a certain period (e.g., Section4.2). Among the trials, some children may be very unlikely to develop dental caries (e.g., dueto good oral hygiene habits or protective genetic factors), while others have higher chancesto develop any. Therefore, targeting the latter group of children, that can be characterizedthrough the Poisson portion, would produce higher cost-effectiveness and utility for dentalcaries prevention strategies.The Poisson portion can be modeled as follows. Suppose p − { A i = T } , where A i denotes a participant’s assignment to either the intervention ( T ) or control ( C ) arm,and x i,p − = ( x i , x i , ..., x i,p − ) t denotes the remaining p − µ i ) = x ti β = β + β { A i = T } + x ti,p − η , (2)5here x i = (1 , { A i = T } , x ti,p − ) t and β = ( β , β , η t ) t = ( β , β , β , ..., β p − ) t are the regressioncoefficients. Note that β measures the intervention effect on the mean parameter in thePoisson portion, which is the parameter we aim to recover. We denote β = ( β , β , η ,t ) t as the true regression parameters.From Equations (1) and (2), the estimating equations under the true method is given by S ZIP ( β ) (cid:44) n (cid:88) y i =0 (cid:20) π i π i + (1 − π i ) exp {− exp( x ti β ) } (cid:21) exp( x ti β ) x i + 1 n n (cid:88) i =1 {− exp( x ti β ) + y i } x i . By solving S ZIP ( β ) = 0, we can obtain the maximum likelihood estimates (MLE), ˆ β MLE .As β are the true parameters for the ZIP model (1), we also have E [ S ZIP ( β )] = 0 andˆ β MLE → β as n → ∞ by standard likelihood inference. Note that π i can be modeledseparately in a logistic model. However, we do not attempt to model π i because it is not theinterest of the current study. For the same clinical trial design described in the previous subsection, many researchers (cf.,Fraz˜ao (2011), Murphy et al. (2000), etc.) have used the conventional (CV) Poisson modelto analyze zero-inflated count outcome y i with f CV ( y i | µ i ) = e − µi µ yii y i ! , where log( µ i ) = x ti β .Under the conventional method, we derive the following estimating equations S CV ( β ) (cid:44) n n (cid:88) i =1 {− exp( x ti β ) + y i } x i (3)and denote ˆ β CV as the solution of S CV ( β ) = 0. Then ˆ β CV are the parameter estimates in theconventional method, which are usually reported in each individual trial. Define β ∗ as thesolution of E [ S CV ( β )] = 0. By the standard asymptotic theory of M-estimation (cf., Serfling(2009)), we can show that ˆ β CV → β ∗ , as n → ∞ . Since the estimating equations do notaccount for zero-inflation, there is a discrepancy between β ∗ and the true parameter values β , so the intervention effect estimate from the conventional method, ˆ β , CV , is biased. In6he current study, we focus on the MLE of the true intervention effect, ˆ β , MLE , which can berecovered by modifying ˆ β , CV . In this section, we formally describe zero-inflation bias as the difference between the param-eters of the true method (i.e., β ) and those of the conventional method (i.e., β ∗ ). Denote δ as zero-inflation bias for all parameters, then δ = β − β ∗ , and ˆ β MLE ≈ ˆ β CV + δ . Since ˆ β , MLE is of primary interest, we focus on the corresponding zero-inflation bias for the interventioneffect δ and the following formula ˆ β ≈ ˆ β + δ .We can characterize δ by taking a close look at the equations E [ S CV ( β ∗ )] = 0. Equation(3) can be recast as:0 = E [ S CV ( β ∗ )] = 1 n n (cid:88) i =1 (cid:2) (1 − π i ) exp { x ti ( β − β ∗ ) } − (cid:3) exp( x ti β ∗ ) x i = 1 n n (cid:88) i =1 { (1 − π i ) exp( x ti δ ) − } exp( x ti β ∗ ) x i (cid:44) B ( δ ) , (4)which shows zero-inflation bias δ is the solution of B ( δ ) = 0. However, x i and π i requireparticipant-level information, which is unavailable in AD meta-analysis. Hence, Equation(4) cannot be solved directly. Alternatively, we can approximate B ( δ ) by substituting x i and π i with study-level summary information, and then solve for the approximated B ( δ ).We describe the approximation in detail in the following section.7 .3 Approximate bias δ : The ZIBC method In this section, we describe the ZIBC method to approximate δ using Equation (4). First,we can simplify B ( δ ) by B ( δ ) = 1 n n (cid:88) i =1 { (1 − π i ) exp( x ti δ ) − } exp( x ti β ∗ ) x i ≈ n { (1 − ¯ π ) exp( ¯x t δ ) − } n (cid:88) i =1 exp( x ti β ∗ ) x i (cid:44) B ( δ ) approx , (5)where ¯ π = n (cid:80) ni =1 π i is the average structural zero rate and ¯ x are the average values forcovariates in the sample. Thus, part of the participant-level information (i.e., x i and π i )are substituted with the study-level summary statistics (i.e., ¯ x and ¯ π ) to approximate B ( δ ).Rewrite ¯x = (1 , ¯z t ) t , where ¯z = ( (cid:80) ni =1 { A i = T } /n, ¯x tp − ) t , and δ = ( δ , δ tp − ) t , then Equation(5) becomes B ( δ ) approx = n { (1 − ¯ π ) exp( δ + ¯z t δ p − ) − } (cid:80) ni =1 y i x i , and a solution for B ( δ ) approx = 0 is ˆ δ approx = ˆ δ , approx ˆ δ p − , approx = − log(1 − ¯ π ) . (6)Thus, the MLE of the true intercept can be recovered byˆ β , MLE ≈ ˆ β , CV + ˆ δ , approx = ˆ β , CV − log(1 − ¯ π ) . (7)However, ˆ β , MLE can not be obtained directly as ˆ δ , approx = 0 in Equation (6). To get aroundthis limitation, we can estimate ˆ β , MLE by estimating the MLE of the intercept separately forthe control and intervention groups, based on Equation (7). The specific steps are describedas follows:1. Consider the sample as being comprised of two separate and independent groups:Intervention and Control.2. For each group, derive a bias-corrected intercept from the conventional method usingEquation (7). 8. Merge the corrected intercepts of the two groups from step 2 to obtain the correctedintervention effect estimate. The details are given as follows.Denote C = { i | A i = C, i = 1 , , ..., n } and T = { i | A i = T, i = 1 , , ..., n } as the indexsets for control and intervention groups, respectively. We further denote | C | = n C and | T | = n T . We first consider control group. Since { A i = T } = 0 for i ∈ C , Equation (2)becomes log( µ i ) = β + x ti,p − η . Denote ˆ β C, MLE and ˆ β C, CV as the parameter estimates underthe true and conventional methods, respectively. Based on Equation (7), we haveˆ β ,C, MLE ≈ ˆ β ,C, CV − log(1 − ¯ π C ) , (8)where ¯ π C = n C (cid:80) i ∈ C π i is the average structural zero rate in control group.We then consider intervention group. Since { A i = T } = 1 for i ∈ T , Equation (2) becomeslog( µ i ) = β + β + x ti,p − η . Note that the intercept becomes ( β + β ), which includes theintervention effect. Under similar arguments and notations, we then have (cid:92) ( β + β ) T, MLE ≈ (cid:92) ( β + β ) T, CV − log(1 − ¯ π T ) , (9)where ˆ β T, CV is the parameter estimate from the conventional method and ¯ π T = n T (cid:80) i ∈ T π i is the average structural zero rate in intervention group. We introduce the following Lemma1 to estimate ˆ β , MLE by transforming Equations (8) and (9). The proof is given in AppendixA.
Lemma 1.
In a study given by Equations (1) and (2), denote the observed covariatesexcluding the intervention assignment as x i,p − = ( x i , x i , ..., x i,p − ) t for i = 1 , ..., n . If ¯ x C,p − = ¯ x T,p − , where ¯ x C,p − = n C (cid:80) i ∈ C x i,p − and ¯ x T,p − = n T (cid:80) i ∈ T x i,p − , then we have ˆ β , MLE ≈ ˆ β , CV − log(1 − ¯ π T ) + log(1 − ¯ π C ) . (10)Lemma 1 gives the correction formula, Equation (10), of the proposed ZIBC method. Theassumption ¯ x C,p − = ¯ x T,p − requires that the “average” subject in control group has the same9ovariate values as the “average” subject in intervention group. In a randomized controlledtrial, subjects are randomized to either a control or intervention group, thus the covariatesshould follow similar distributions across the groups. In addition, the participants in controland intervention groups are expected to be equivalent not only in all measured covariates butalso in other unmeasured ones. Hence, the assumption of Lemma 1 can reasonably hold inthis case. Note that the correction depends on the relative difference between the structuralzero rates of two groups. If ¯ π C < ¯ π T , then ˆ β , CV is likely to be smaller than ˆ β , MLE , and viceversa, suggesting that the conventional method tends to overestimate the intervention effectin a typical clinical trial.The group-level structural zero rates ¯ π C and ¯ π T can be estimated using the followingalgorithm. Take the control group, for i ∈ C , as an example, we have E [¯ y ] = n C (cid:80) i ∈ C E ( y i ) = n C (cid:80) i ∈ C (1 − π i ) µ i ≈ ¯ y obs ,C , E [ y i = 0] = (cid:80) i ∈ C P ( y i = 0) = (cid:80) i ∈ C { π i + (1 − π i ) e − µ i } ≈ n , obs ,C , (11)where n C , ¯ y obs ,C , and n , obs ,C are the sample size, observed outcome average, and observednumber of zero outcomes, respectively, for the control group. To estimate ¯ π C , we approximateEquation (11) by substituting π i with ¯ π C , and µ i with ¯ µ C = n C (cid:80) i ∈ C µ i , resulting in (1 − ¯ π C )¯ µ C ≈ ¯ y obs ,C , { ¯ π C + (1 − ¯ π C ) e − ¯ µ C } ≈ n , obs ,C /n C . (12)Here, n , obs ,C /n C is the proportion of zero outcome values in control group. By solvingEquation (12), we can get an approximation of ¯ π C . Similarly, we can get ¯ π T using the sameprocess.The data required for the ZIBC method are (a) ˆ β , CV , (b) ¯ y obs ,C , ¯ y obs ,T , and (c) n , obs ,C /n C , n , obs ,T /n T . In a typical trial study, (a) and (b) are directly reported or can be obtained, while(c) are less frequently reported but may be obtained via author queries to the investigatorsof original studies. 10 .4 Implementation in meta-analysis Suppose an AD meta-analysis contains K studies that used the conventional method tomodel the zero-inflated outcomes. For study s , s ∈ { , , ..., K } , we can implement theZIBC method to obtain the bias-corrected intervention effect ˆ β s, corrected . For simplicity, weuse the reported standard errors (cid:99) SE s, CV from the conventional method. With the new setof intervention effects and standard errors, standard AD meta-analysis can be applied tocombine results across studies and obtain the corrected overall intervention effect estimate. We conducted simulation studies to examine the performance of the ZIBC method. Specifi-cally, we compare relative performance of the following three methods:1. ZIP regression model (i.e., the true method), the “gold standard” method, which isnot feasible in AD meta-analysis,2. Poisson regression model (i.e., the conventional method), the method with zero-inflationbias when the outcome is zero-inflated, and3. ZIBC method, the method to correct zero-inflation bias from the conventional methodand recover the intervention effect as if it came from the true method.In the simulation study, we consider K = 10 randomized clinical trials aimed at evaluatingthe effect of an intervention on reducing alcohol consumption, where the outcome is thenumber of alcoholic drinks. For each trial, we consider a balanced random assignmentof participants to either intervention or control group. We also incorporate an additionalcovariate that follows the standard normal distribution. The simulation was motivated byProject INTEGRATE, a large-scale meta-analysis project examining the effectiveness of briefmotivational interventions on reducing alcohol consumption among young adults (Mun et al.11015). High proportions of zero alcoholic drinks (i.e., non-drinking) were observed in mosttrials included in the study.The settings of the simulation are based on our observation of the motivating data.Specifically, the sample sizes for individual trials are set at 200 and 400 for studies 1-5 and6-10, respectively. For study s ∈ { , , ..., K } with sample size n s , the outcome of i -thsubject ( i ∈ { , , ..., n s } ) is simulated by a true ZIP regression model y si ∼ Poisson( µ si )with probability 1 − π si , and 0 otherwise. The structural zero rate π si and Poisson meanparameter µ si are simulated by logit( π si ) = γ + γ { A si = T } + γ Cov si and log( µ si ) = β + β { A si = T } + β Cov si with a continuous covariate Cov si ∼ N (0 ,
1) and intervention groupassignment { A si = T } ∼ Bernoulli(0 . β , β , β ) = (1 . , − . , . β , β , β ) = (1 . , − . , . β , β , β ) = (0 . , − . , . β ) varies from − . − .
2, the intercept ( β ) alsovaries accordingly to fix the maximum possible log( µ si ) at the same level of 0.95.To evaluate the impact of different degrees of zero-inflation on the bias and performanceof the methods, we varied the overall proportion of zero alcoholic drinks at 0.25, 0.30 and0.35 among trials. Then γ , γ and γ can be calculated to yield the aforementioned zerorates. In the simulation, we fixed γ = 0 .
5, indicating that participants in the interventiongroup will have a higher probability of no drinking, compared to the control. For example,more participants who previously drank may quit drinking after intervention, compared withtheir control counterparts. To achieve identifiability in estimating the parameters, we addedone additional constraint: γ = γ . We also tested different constraints in the simulationstudy, which yielded similar results. This suggests that the simulation results reported inthe current study are robust regardless of the choice of constraints (results not shown but12vailable upon request).In one replication of the simulation, data from 10 clinical trials were generated. For eachstudy, both the true and conventional methods were estimated first, then the ZIBC methodwas applied to modify the intervention effect estimate from the conventional method. Finally,for each of the three methods, we applied random-effects meta-analysis using the metafor R package (Viechtbauer 2010), and generated forest plots to compare performance betweenthe methods.Figure 1 shows a forest plot from a typical replication during simulation when the trueintervention effect β = − . .
35. Based on the results, we havethe following four observations. First, the conventional method produced biased estimatesof intervention effects for individual studies as well as the overall result after meta-analysis.Specifically, it overestimated the magnitude of the overall intervention effects ( − .
36 vs. − . β = − .
2, for each study, as well as the over-all effect across studies. Third, the ZIBC method corrected zero-inflation bias to the rightdirection for each study. Finally, after meta-analysis, the corrected overall estimate fromthe ZIBC method was very close to the true parameter value of − .
2. In sum, this typi-cal simulation replication illustrates that the ZIBC method reasonably corrects the biasedintervention effect estimates from the conventional method.Figure 1 graphically illustrates the good performance of the ZIBC method in a singlesimulation replication. To examine the performance numerically across replications, wecompared the intervention effect estimates from the three methods with the true interventioneffect β by calculating the coverage indicator (1 if the 95% confidence interval covers β and 0otherwise) and differences with β at each replication. After 1000 replications, we calculatedthe proportion of replications whose 95% confidence intervals captured β (coverage rate),13nd the mean squared error (MSE) between the effect estimate and β . We used both ofthese indices to compare performance across the methods.Figure 2 presents the results for different simulation settings. It shows that first, the truemethod had the highest coverage rates, which were close to 0.95, and also had MSE valuesclose to 0. Second, the conventional method resulted in biased intervention effect estimates,as indicated by low coverage rates and high MSE values. Note that as zero rates got higher,zero-inflation bias became greater, leading to progressively lower coverage rates and higherMSE values. Third, the ZIBC method had acceptable coverage rates close to 0.9 and lowMSE that were close to 0. Furthermore, the performance of the ZIBC method was consistentacross different zero rates. Based on the comparative results shown in Figures 1 and 2, weconclude that the ZIBC method provides reasonable correction for zero-inflation bias of theintervention effect from the conventional method in AD meta-analysis. In this section, we apply the ZIBC method to two real data examples. In Section 4.1,we demonstrate the performance of the proposed method using individual participant data(IPD) from Project INTEGRATE. In Section 4.2, we illustrate the application of the ZIBCmethod to a randomized controlled trial on preventing dental caries in the field of oral health.
Project INTEGRATE is a large-scale IPD meta-analysis study examining the overall efficacyand comparative effectiveness of brief alcohol interventions for young adults (Mun et al.2015). A recent IPD meta-analysis of 6,713 participants from 17 randomized controlled trialsexamined the effect of intervention on the total number of drinks consumed in a typical week,a count variable with a high percentage of zeros (Huh et al. 2015). Across all studies, anaverage of 30% of individuals reported zero drinking, with the highest proportion of zero14rinking being 66% in one study.We applied the ZIBC method to the tutorial data in Huh et al. (2019) to evaluate itsperformance. Clinical trials included in the current study (a) randomly allocated participantsto an intervention or control group, (b) had a follow-up within 6 months from baseline, and(c) had at least one zero outcome in a study. Ten of the 17 studies met the criteria (studies2, 7 (7.1 and 7.2), 9, 11, 14, 15, 16, 18 and 21). For more details of the studies, please referto Mun et al. (2015), and Huh et al. (2015, 2019). The outcome was the average drinks on atypical drinking day in the most recent follow-up assessment within 6 months, with a fixedassessment time for each study. We included the intervention group assignment as the onlycovariate.For data analysis, we followed the same steps as in simulation and present the comparativeresults in a forest plot (Figure 3). For most studies, the conventional method produced biasedintervention effect estimates compared with the true method. Note that the conventionalmethod overestimated the true effect in some studies and underestimated it in others. Thisis because the average structural zero rates in intervention groups (see ¯ π T in Figure 3) werehigher than the corresponding control group in studies 9, 14, and 16 (see ¯ π C in Figure 3),so that the conventional method overestimated the intervention effects in these studies. Incontrast, in other studies where the average proportion of zeros was higher in the control,compared with intervention group, the direction of zero-inflation bias reversed. The oppositedirections of zero-inflation bias may be partly attributable to small intervention effects acrossthe studies. In such data situations, small variations can influence the direction of an effect.The data example demonstrates that the ZIBC method corrects zero-inflation bias re-gardless of the directions of the bias in the meta-analysis. In conclusion, the ZIBC methodshowed good performance in correcting zero-inflation bias for individual studies as well asfor combining such data in meta-analysis. 15 .2 Analysis 2: A dental caries prevention clinical trial Fraz˜ao (2011) conducted a randomized controlled trial to evaluate whether the bucco-lingualtechnique could increase the effectiveness of a toothbrushing program on preventing dentalcaries (i.e., cavities) among five-year-old children. This study was a two-arm trial thatrandomized participants to either a conventional tooth brushing program (Control) or amodified tooth brushing program (Intervention). The outcome of interest was the number ofenamel and dentin caries at 18-month follow up, which exhibited considerable zero-inflation,with rates up to 67%. The conventional Poisson regression model was used to evaluate theintervention effect in the original study. The analysis was stratified by gender due to baselineimbalance in covariates. Since a high proportion of participants did not develop any dentalcaries, zero-inflation bias would be expected in the intervention effect estimates from theconventional method.We apply the proposed ZIBC method here in order to correct for zero-inflation bias.First, we extracted the required information from the original study (see Table 1). Specifi-cally, the uncorrected effects (i.e., ˆ β , CV and (cid:99) SE , CV ) were calculated from incidence densityratios (IDR) and 95% confidence intervals in the original Table 3 from Fraz˜ao (2011), and thearm-level outcome averages and the proportion of zeros (i.e., ¯ y obs ,C , ¯ y obs ,T , n , obs ,C /n C and n , obs ,T /n T ) were obtained directly from the original Figure 2 by using software WebPlotDig-itizer version 4.2 (Rohatgi 2019). We then estimated the arm-level average structural zerorates ¯ π C and ¯ π T by solving Equation (12), which were 49% and 32%, respectively, for girlsand 27% and 45%, respectively, for boys. Finally, we obtained the corrected interventioneffect estimates ˆ β , MLE by plugging the values of ˆ β , CV , ¯ π C and ¯ π T into Equation (10). Usingthe original standard errors (cid:99) SE , CV , we obtained the modified p-values based on the Waldtest.The original and ZIBC method-corrected results are summarized in Table 2. For girls,in the original analysis, although the intervention had an insignificant effect, girls in the16odified toothbrushing program tended to develop more caries with an IDR of 1 .
34, repre-senting an even worse intervention effect. After applying the ZIBC method to adjust for thezero-inflation, the IDR was corrected to 1 .
01, indicating a null intervention effect and in linewith the expectation that the interventions would not be harmful. For boys, the originalanalysis showed a significant intervention effect with an IDR of − .
74 and P-value of 0 . − . .
13. It is worth noting that originally the conventional Poisson regressionmodel yielded a statistically significant intervention effect, but after the adjustment, it be-came insignificant. This example illustrates that without the correction for zero-inflationbias, the Poisson regression model overestimated the intervention effect and produced a falsepositive result.
This paper focuses on identifying and addressing bias that can arise in AD meta-analysesof zero-inflated count outcomes, which are prevalent in health-related research and manyother areas. When an outcome variable with an excessive rate of zeros is modeled withoutaccounting for zero inflation, zero-inflation bias will result. Further, when those studies arecombined in the case of a meta-analysis, the bias will be carried over into overall estimatesacross studies, which can lead to invalid inferences. For example, our simulation studyfound that for a two-arm randomized controlled trial design, the confidence interval of theintervention effect estimate from the conventional method included the true value less than40% and 10% of the time when the zero rate was 25% and 35%, respectively. When theIPD is available for a study, zero-inflation bias can be corrected by re-analyzing the datawith an appropriate zero-inflated model, such as the ZIP regression model. However, itis usually not feasible to acquire IPD from original studies. Thus, AD meta-analysis ismore commonly used with biased estimates from zero-inflated outcomes. The current study17roposes a method to correct biased estimates in such situations.As demonstrated through simulation and the real data example from Project INTE-GRATE, the ZIBC method can provide a methodological solution for obtaining more accu-rate estimates of intervention effects in an AD meta-analysis. The ZIBC method specificallyworks well when one can use the information of the “average” subject in the sample to ap-proximate the study result, as we substituted some IPD required in the estimating equations B ( δ ) (i.e., x i and π i ) with their study-level average values (i.e., ¯ x and ¯ π ), to relax IPDrequirement. This modification is in line with the idea of Mean Value Theorem for Inte-grals. The adjustment of the biased intervention effect estimate is obtained by combiningthe results of participants in the control and intervention groups. The statistical propertyof the ZIBC method is justified by Lemma 1, which is based on the assumption that thecharacteristics (or covariates) of “average” subjects in two groups are similar, which shouldhold in randomized controlled trials due to random assignment to groups. In other situa-tions where the requirement is not met, such as case-control or cross-sectional studies, theZIBC method should be used with caution. In addition, by imposing linear predictors in thetrue ZIP regression model (i.e., Equation (2)), we also implicitly assume no intervention bycovariate interactions on the outcome, which should hold in most trials.The second data example illustrates the application of the proposed ZIBC method todata from a randomized controlled trial, where all required information could be obtainedfrom the original study report. Using the conventional Poisson regression model, the studyreported a statistically significant intervention effect in reducing dental caries. However, afterapplying the ZIBC method, the effect was reduced in its magnitude and became statisticallyinsignificant. Without adjusting for excessive zeros, the Poisson regression model tends tooverestimate the intervention effect, which can lead to false positive results.After correcting zero-inflation bias for each individual trial, the modified interventioneffects are combined in AD meta-analysis to obtain a more accurate overall result. Notethat the ZIBC method only targets the mean intervention effect estimate, and researchers18an borrow standard errors from the original study using the conventional Poisson regressionmodel to conduct AD meta-analysis. Since the conventional method tend to underestimatethe standard errors when the outcome is zero-inflated (B¨ohning et al. 1999, Dobbie & Welsh2001), this practice may slightly overestimate the statistical significance, and other variancecorrection methods for Poisson distribution may be applied. Although the magnitude of thisoverestimation is expected to be insignificant, this requires further investigation. Further-more, the use of other variance correction methods for Poisson distribution may be exploredin future studies.The ZIBC method minimally requires new information for its correction. Most of theinformation required by the ZIBC method can be obtained from study reports, such as inter-vention effect estimates from the conventional Poisson method and group- or arm-level out-come averages. It also requires the group-level outcome zero rates, which are less commonlyprovided in study reports but can be obtained through inquiries with original investigators,or an educated guess when prior information or expert knowledge is available. Therefore,this article provides meta-analysts with a feasible method to correct zero-inflation bias inintervention effect estimates when only AD are available.The ZIBC method we describe can be extended in the future in several ways. First,although we illustrate the ZIBC method in the context of a two-arm trial design, it can beapplied to multi-arm trials by comparing each intervention group with control and correctingthe biased intervention effect per pair. Second, aside from the ZIBC method, alternativestrategies may be investigated for their feasibility and validity when adjusting the estimatingequations for zero-inflation bias (Equation (11)). One potential strategy is to generate pseudoIPD based on AD of outcome and each covariate, and then solve for δ using the pseudo data,which is similar to the idea of Approximate Bayesian Computing (see, e.g., Marin et al.(2012), Beaumont et al. (2002)). Finally, the proposed method is designed to recover biasedintervention effect estimates from the conventional Poisson model when the ZIP regressionmodel should have been used, however, it can be extended to other statistical models with19ppropriate adjustments, such as a negative binomial regression model and a two-sample t -test, which can be thought of as a Wald test in a simple linear regression with interventiongroup membership as the lone covariate. Acknowledgements
Conflict of Interest : None declared.
Funding
This work was supported by National Institutes of Health grants (R01 AA019511) andNational Science Foundation grants (DMS1737857, 1812048, 2015373 and 2027855).
References
Beaumont, M. A., Zhang, W. & Balding, D. J. (2002), ‘Approximate bayesian computationin population genetics’,
Genetics (4), 2025–2035.B¨ohning, D., Dietz, E., Schlattmann, P., Mendonca, L. & Kirchner, U. (1999), ‘The zero-inflated poisson model and the decayed, missing and filled teeth index in dental epidemiol-ogy’,
Journal of the Royal Statistical Society: Series A (Statistics in Society) (2), 195–209.Chen, D.-G., Liu, D., Min, X. & Zhang, H. (2020), ‘Relative efficiency of using summaryversus individual data in random-effects meta-analysis’,
Biometrics .Dobbie, M. J. & Welsh, A. H. (2001), ‘Theory & methods: Modelling correlated zero-inflatedcount data’,
Australian & New Zealand Journal of Statistics (4), 431–444.20raz˜ao, P. (2011), ‘Effectiveness of the bucco-lingual technique within a school-based super-vised toothbrushing program on preventing caries: a randomized controlled trial’, BMCOral Health (1), 11.Garcia, H. H., Pretell, E. J., Gilman, R. H., Martinez, S. M., Moulton, L. H., Del Brutto,O. H., Herrera, G., Evans, C. A. & Gonzalez, A. E. (2004), ‘A trial of antiparasitictreatment to reduce the rate of seizures due to cerebral cysticercosis’, New England Journalof Medicine (3), 249–258.Huh, D., Mun, E.-Y., Larimer, M. E., White, H. R., Ray, A. E., Rhew, I. C., Kim, S.-Y., Jiao,Y. & Atkins, D. C. (2015), ‘Brief motivational interventions for college student drinkingmay not be as powerful as we think: An individual participant-level data meta-analysis’,
Alcoholism: Clinical and Experimental Research (5), 919–931.Huh, D., Mun, E.-Y., Walters, S. T., Zhou, Z. & Atkins, D. C. (2019), ‘A tutorial onindividual participant data meta-analysis using bayesian multilevel modeling to estimatealcohol intervention effects across heterogeneous studies’, Addictive Behaviors , 162–170.Lambert, D. (1992), ‘Zero-inflated poisson regression, with an application to defects in man-ufacturing’, Technometrics (1), 1–14.Liu, Y. & Chen, Y. (2018), Avenues for further research, in ‘Diagnostic Meta-Analysis’,Springer, pp. 305–315.Long, D. L., Preisser, J. S., Herring, A. H. & Golin, C. E. (2014), ‘A marginalized zero-inflated poisson regression model with overall exposure effects’, Statistics in medicine (29), 5151–5165.Lyman, G. H. & Kuderer, N. M. (2005), ‘The strengths and limitations of meta-analysesbased on aggregate data’, BMC Medical Research Methodology (1), 14.21arin, J.-M., Pudlo, P., Robert, C. P. & Ryder, R. J. (2012), ‘Approximate bayesian com-putational methods’, Statistics and Computing (6), 1167–1180.Mun, E.-Y., de la Torre, J., Atkins, D. C., White, H. R., Ray, A. E., Kim, S.-Y., Jiao,Y., Clarke, N., Huo, Y., Larimer, M. E. & Huh, D. (2015), ‘Project INTEGRATE: Anintegrative study of brief alcohol interventions for college students’, Psychology of AddictiveBehaviors (1), 34–48.Murphy, S. W., Foley, R. N., Barrett, B. J., Kent, G. M., Morgan, J., Barr´e, P., Campbell,P., Fine, A., Goldstein, M. B., Handa, S. P. et al. (2000), ‘Comparative hospitalization ofhemodialysis and peritoneal dialysis patients in canada’, Kidney international (6), 2557–2563.Rohatgi, A. (2019), ‘Webplotdigitizer version: 4.2’, https://automeris.io/WebPlotDigitizer .Schmid, C. H., Stijnen, T. & White, I. (2020), Handbook of Meta-Analysis , CRC Press.Serfling, R. J. (2009),
Approximation theorems of mathematical statistics , Vol. 162, JohnWiley & Sons.Silcocks, P., Whitham, D. & Whitehouse, W. P. (2010), ‘P3mc: A double blind parallel grouprandomised placebo controlled trial of propranolol and pizotifen in preventing migraine inchildren’,
Trials (1), 71.Sutton, A. J. & Higgins, J. P. (2008), ‘Recent developments in meta-analysis’, Statistics inMedicine (5), 625–650.Viechtbauer, W. (2010), ‘Conducting meta-analyses in r with the metafor package’, Journalof Statistical Software (3), 1–48. 22 ppendix A. Proof of Lemma 1 Proof.
Consider three “average” subjects in the control group, intervention group, and theoverall sample (denoted as { average , C } , { average , T } , and { average } ), with x average ,C,p − =¯ x C,p − , x average ,T,p − = ¯ x T,p − , and x average ,p − = ¯ x p − , respectively. Without loss of generality,assuming that observed covariates excluding the intervention assignment are grand meancentered before data analysis, we have x average ,p − = ¯ x p − = . Since ¯ x C,p − = ¯ x T,p − , wealso have x average ,C,p − = x average ,T,p − = x average ,p − = . Therefore, we havelog( µ average ,C ) = β ,C log( µ average ,T ) = β ,T + β ,T log( µ average ) = β + β { A average = T } under the true method. If an average subject in the overall sample belongs to the controlgroup, then log( µ average ) = log( µ average ,C ) ⇒ β ,C = β ⇒ ˆ β ,C, MLE ≈ ˆ β ,MLE . (A.1)Similarly, if an average subject belongs to the intervention group, thenlog( µ average ) = log( µ average ,T ) ⇒ β ,T + β ,T = β + β ⇒ (cid:92) ( β + β ) T, MLE ≈ ˆ β ,MLE + ˆ β ,MLE . (A.2)Under similar arguments, for the conventional method, we haveˆ β ,C, CV ≈ ˆ β , CV (A.3)and (cid:92) ( β + β ) T, CV ≈ ˆ β ,CV + ˆ β ,CV . (A.4)23lug Equations (A.1) and (A.3) into Equation (8), and plug Equations (A.2) and (A.4) intoEquation (9), we have ˆ β , MLE ≈ ˆ β , CV − log(1 − ¯ π C )ˆ β , MLE + ˆ β , MLE ≈ ˆ β , CV + ˆ β , CV − log(1 − ¯ π T ) , (A.5)which directly gives Equation (10). 24able 1: Information extracted from Fraz˜ao (2011)Summary information Data source Girls Boysˆ β , CV Table 3 0.29 -0.73 (cid:99) SE , CV Table 3 0.28 0.30¯ y obs ,C Figure 2 (with WebPlotDigitizer) 0.83 1.04¯ y obs ,T Figure 2 (with WebPlotDigitizer) 1.06 0.49 n , obs ,C /n C Figure 2 (with WebPlotDigitizer) 59% 45% n , obs ,T /n T Figure 2 (with WebPlotDigitizer) 47% 67%25able 2: Original and ZIBC method-corrected intervention effect estimates, incidence densityratios (IDRs) and P-values, for girls and boys, respectivelyEstimate IDR P-valueGirls Original 0.29 1.34 0.29Corrected 0.01 1.01 0.97Boys Original -0.73 0.48 0.02Corrected -0.46 0.63 0.1326 reatment (true value: −0.2) −0.8 −0.4 0 0.2Log rate ratio CV Method ZIBC MethodOverall: True Method CV Method ZIBC Method10: True Method CV Method ZIBC Method9: True Method CV Method ZIBC Method8: True Method CV Method ZIBC Method7: True Method CV Method ZIBC Method6: True Method CV Method ZIBC Method5: True Method CV Method ZIBC Method4: True Method CV Method ZIBC Method3: True Method CV Method ZIBC Method2: True Method CV Method ZIBC Method1: True Method −0.36 [−0.42, −0.30]−0.19 [−0.25, −0.14]−0.20 [−0.27, −0.14]−0.28 [−0.43, −0.14]−0.14 [−0.28, 0.01]−0.14 [−0.31, 0.02]−0.39 [−0.55, −0.23]−0.17 [−0.32, −0.01]−0.21 [−0.40, −0.03]−0.45 [−0.60, −0.29]−0.32 [−0.47, −0.16]−0.32 [−0.49, −0.14]−0.31 [−0.46, −0.16]−0.22 [−0.37, −0.06]−0.19 [−0.36, −0.01]−0.31 [−0.47, −0.16]−0.20 [−0.35, −0.05]−0.23 [−0.40, −0.05]−0.55 [−0.77, −0.32]−0.33 [−0.55, −0.10]−0.33 [−0.59, −0.07]−0.20 [−0.41, 0.01]−0.06 [−0.27, 0.15]−0.14 [−0.38, 0.09]−0.53 [−0.76, −0.31]−0.20 [−0.42, 0.03]−0.18 [−0.45, 0.10]−0.41 [−0.63, −0.20]−0.19 [−0.40, 0.02]−0.22 [−0.47, 0.02]−0.27 [−0.48, −0.06]−0.09 [−0.31, 0.12]−0.07 [−0.32, 0.17]
Figure 1: A typical forest plot for the true, ZIBC and conventional methods when β = − . l l . . . . . . b = , b = −0.5 Zero rate C o v e r age R a t e l l ll l l True MethodZIBC MethodCV Method . l l l . . . . . . b = , b = −0.35 Zero rate C o v e r age R a t e l l ll l l . l l l . . . . . . b = , b = −0.2 Zero rate C o v e r age R a t e l l ll l l . l l l . . . . b = , b = −0.5 Zero rate M SE l l ll l l l l l . . . . b = , b = −0.35 Zero rate M SE l l ll l l l l l . . . . b = , b = −0.2 Zero rate M SE l l ll l l Figure 2: Coverage rates and MSE values of the true (blue dashed line), ZIBC (red dottedline) and conventional (black solid line) methods from 1000 replications28 roject INTEGRATE −0.6 −0.2 0 0.2 0.4Log rate ratio CV Method ZIBC MethodOverall: True Method CV Method ZIBC Method21: True Method CV Method ZIBC Method18: True Method CV Method ZIBC Method16: True Method CV Method ZIBC Method15: True Method CV Method ZIBC Method14: True Method CV Method ZIBC Method11: True Method CV Method ZIBC Method9: True Method CV Method ZIBC Method7.2: True Method CV Method ZIBC Method7.1: True Method CV Method ZIBC Method2: True Method −0.07 [−0.17, 0.03]−0.09 [−0.19, 0.01]−0.09 [−0.19, 0.01]−0.02 [−0.12, 0.07]−0.03 [−0.13, 0.07]−0.03 [−0.13, 0.07]−0.00 [−0.12, 0.11]−0.14 [−0.26, −0.03]−0.14 [−0.26, −0.03] 0.03 [−0.09, 0.14] 0.13 [ 0.02, 0.24] 0.13 [ 0.01, 0.24]−0.23 [−0.36, −0.10]−0.37 [−0.50, −0.24]−0.37 [−0.50, −0.24]−0.10 [−0.23, 0.02]−0.07 [−0.20, 0.05]−0.07 [−0.20, 0.05]−0.11 [−0.26, 0.03]−0.12 [−0.26, 0.02]−0.12 [−0.26, 0.02]−0.27 [−0.38, −0.17]−0.17 [−0.27, −0.06]−0.17 [−0.27, −0.06] 0.24 [ 0.16, 0.32] 0.16 [ 0.08, 0.25] 0.16 [ 0.08, 0.25]−0.04 [−0.17, 0.09]−0.09 [−0.23, 0.04]−0.09 [−0.23, 0.04]−0.22 [−0.36, −0.07]−0.26 [−0.41, −0.11]−0.26 [−0.41, −0.11] p C = 0.10; p T = 0.10 p C = 0.39; p T = 0.30 p C = 0.34; p T = 0.40 p C = 0.41; p T = 0.32 p C = 0.00; p T = 0.03 p C = 0.67; p T = 0.67 p C = 0.07; p T = 0.17 p C = 0.18; p T = 0.12 p C = 0.13; p T = 0.08 p C = 0.29; p T = 0.25= 0.25