Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Li Tang is active.

Publication


Featured researches published by Li Tang.


Epidemiology | 2011

Validation Data-Based Adjustments for Outcome Misclassification in Logistic Regression:An Illustration

Robert H. Lyles; Li Tang; Hillary M. Superak; Caroline C. King; David D. Celentano; Yungtai Lo; Jack D. Sobel

Misclassification of binary outcome variables is a known source of potentially serious bias when estimating adjusted odds ratios. Although researchers have described frequentist and Bayesian methods for dealing with the problem, these methods have seldom fully bridged the gap between statistical research and epidemiologic practice. In particular, there have been few real-world applications of readily grasped and computationally accessible methods that make direct use of internal validation data to adjust for differential outcome misclassification in logistic regression. In this paper, we illustrate likelihood-based methods for this purpose that can be implemented using standard statistical software. Using main study and internal validation data from the HIV Epidemiology Research Study, we demonstrate how misclassification rates can depend on the values of subject-specific covariates, and we illustrate the importance of accounting for this dependence. Simulation studies confirm the effectiveness of the maximum likelihood approach. We emphasize clear exposition of the likelihood function itself, to permit the reader to easily assimilate appended computer code that facilitates sensitivity analyses as well as the efficient handling of main/external and main/internal validation-study data. These methods are readily applicable under random cross-sectional sampling, and we discuss the extent to which the main/internal analysis remains appropriate under outcome-dependent (case-control) sampling.


Statistics in Medicine | 2012

Likelihood-based methods for regression analysis with binary exposure status assessed by pooling.

Robert H. Lyles; Li Tang; Ji Lin; Zhiwei Zhang; Bhramar Mukherjee

The need for resource-intensive laboratory assays to assess exposures in many epidemiologic studies provides ample motivation to consider study designs that incorporate pooled samples. In this paper, we consider the case in which specimens are combined for the purpose of determining the presence or absence of a pool-wise exposure, in lieu of assessing the actual binary exposure status for each member of the pool. We presume a primary logistic regression model for an observed binary outcome, together with a secondary regression model for exposure. We facilitate maximum likelihood analysis by complete enumeration of the possible implications of a positive pool, and we discuss the applicability of this approach under both cross-sectional and case-control sampling. We also provide a maximum likelihood approach for longitudinal or repeated measures studies where the binary outcome and exposure are assessed on multiple occasions and within-subject pooling is conducted for exposure assessment. Simulation studies illustrate the performance of the proposed approaches along with their computational feasibility using widely available software. We apply the methods to investigate gene-disease association in a population-based case-control study of colorectal cancer.


Journal of Statistical Computation and Simulation | 2016

Comparison of different computational implementations on fitting generalized linear mixed-effects models for repeated count measures

Lu Huang; Li Tang; Bo Zhang; Zhiwei Zhang; Hui Zhang

ABSTRACT In modelling repeated count outcomes, generalized linear mixed-effects models are commonly used to account for within-cluster correlations. However, inconsistent results are frequently generated by various statistical R packages and SAS procedures, especially in case of a moderate or strong within-cluster correlation or overdispersion. We investigated the underlying numerical approaches and statistical theories on which these packages and procedures are built. We then compared the performance of these statistical packages and procedures by simulating both Poisson-distributed and overdispersed count data. The SAS NLMIXED procedure outperformed the others procedures in all settings.


Statistical Methods in Medical Research | 2016

Causal inference with missing exposure information: Methods and applications to an obstetric study

Zhiwei Zhang; Wei Liu; Bo Zhang; Li Tang; Jun Zhang

Causal inference in observational studies is frequently challenged by the occurrence of missing data, in addition to confounding. Motivated by the Consortium on Safe Labor, a large observational study of obstetric labor practice and birth outcomes, this article focuses on the problem of missing exposure information in a causal analysis of observational data. This problem can be approached from different angles (i.e. missing covariates and causal inference), and useful methods can be obtained by drawing upon the available techniques and insights in both areas. In this article, we describe and compare a collection of methods based on different modeling assumptions, under standard assumptions for missing data (i.e. missing-at-random and positivity) and for causal inference with complete data (i.e. no unmeasured confounding and another positivity assumption). These methods involve three models: one for treatment assignment, one for the dependence of outcome on treatment and covariates, and one for the missing data mechanism. In general, consistent estimation of causal quantities requires correct specification of at least two of the three models, although there may be some flexibility as to which two models need to be correct. Such flexibility is afforded by doubly robust estimators adapted from the missing covariates literature and the literature on causal inference with complete data, and by a newly developed triply robust estimator that is consistent if any two of the three models are correct. The methods are applied to the Consortium on Safe Labor data and compared in a simulation study mimicking the Consortium on Safe Labor.


Epidemiologic Methods | 2013

Extended Matrix and Inverse Matrix Methods Utilizing Internal Validation Data When Both Disease and Exposure Status Are Misclassified

Li Tang; Robert H. Lyles; Ye Ye; Yungtai Lo; Caroline C. King

Abstract The problem of misclassification is common in epidemiological and clinical research. In some cases, misclassification may be incurred when measuring both exposure and outcome variables. It is well known that validity of analytic results (e.g. point and confidence interval estimates for odds ratios of interest) can be forfeited when no correction effort is made. Therefore, valid and accessible methods with which to deal with these issues remain in high demand. Here, we elucidate extensions of well-studied methods in order to facilitate misclassification adjustment when a binary outcome and binary exposure variable are both subject to misclassification. By formulating generalizations of assumptions underlying well-studied “matrix” and “inverse matrix” methods into the framework of maximum likelihood, our approach allows the flexible modeling of a richer set of misclassification mechanisms when adequate internal validation data are available. The value of our extensions and a strong case for the internal validation design are demonstrated by means of simulations and analysis of bacterial vaginosis and trichomoniasis data from the HIV Epidemiology Research Study.


Statistical Methods in Medical Research | 2017

A non-parametric model to address overdispersed count response in a longitudinal data setting with missingness

Hui Zhang; Hua He; Naiji Lu; Liang Zhu; Bo Zhang; Zhiwei Zhang; Li Tang

Count responses are becoming increasingly important in biostatistical analysis because of the development of new biomedical techniques such as next-generation sequencing and digital polymerase chain reaction; a commonly met problem in modeling them with the popular Poisson model is overdispersion. Although it has been studied extensively for cross-sectional observations, addressing overdispersion for longitudinal data without parametric distributional assumptions remains challenging, especially with missing data. In this paper, we propose a method to detect overdispersion in repeated measures in a non-parametric manner by extending the Mann–Whitney–Wilcoxon rank sum test to longitudinal data. In addition, we also incorporate the inverse probability weighted method to address the data missingness. The proposed model is illustrated with both simulated and real study data.


Statistics in Medicine | 2015

Binary regression with differentially misclassified response and exposure variables

Li Tang; Robert H. Lyles; Caroline C. King; David D. Celentano; Yungtai Lo

Misclassification is a long-standing statistical problem in epidemiology. In many real studies, either an exposure or a response variable or both may be misclassified. As such, potential threats to the validity of the analytic results (e.g., estimates of odds ratios) that stem from misclassification are widely discussed in the literature. Much of the discussion has been restricted to the nondifferential case, in which misclassification rates for a particular variable are assumed not to depend on other variables. However, complex differential misclassification patterns are common in practice, as we illustrate here using bacterial vaginosis and Trichomoniasis data from the HIV Epidemiology Research Study (HERS). Therefore, clear illustrations of valid and accessible methods that deal with complex misclassification are still in high demand. We formulate a maximum likelihood (ML) framework that allows flexible modeling of misclassification in both the response and a key binary exposure variable, while adjusting for other covariates via logistic regression. The approach emphasizes the use of internal validation data in order to evaluate the underlying misclassification mechanisms. Data-driven simulations show that the proposed ML analysis outperforms less flexible approaches that fail to appropriately account for complex misclassification patterns. The value and validity of the method are further demonstrated through a comprehensive analysis of the HERS example data.


Statistical Methods in Medical Research | 2018

Distribution-free models for latent mixed population responses in a longitudinal setting with missing data

Hui Zhang; Li Tang; Yuanyuan Kong; Tian Chen; Xueyan Liu; Zhiwei Zhang; Bo Zhang

Many biomedical and psychosocial studies involve population mixtures, which consist of multiple latent subpopulations. Because group membership cannot be observed, standard methods do not apply when differential treatment effects need to be studied across subgroups. We consider a two-group mixture in which membership of latent subgroups is determined by structural zeroes of a zero-inflated count variable and propose a new approach to model treatment differences between latent subgroups in a longitudinal setting. It has also been incorporated with the inverse probability weighted method to address data missingness. As the approach builds on the distribution-free functional response models, it requires no parametric distribution model and thereby provides a robust inference. We illustrate the approach with both real and simulated data.


Health Services and Outcomes Research Methodology | 2018

Are marginalized two-part models superior to non-marginalized two-part models for count data with excess zeroes? Estimation of marginal effects, model misspecification, and model selection

Xueyan Liu; Bo Zhang; Li Tang; Zhiwei Zhang; Ning Zhang; J. Allison; Deo Kumar Srivastava; Hui Zhang

The marginalized two-part models, including the marginalized zero-inflated Poisson and negative binomial models, have been proposed in the literature for modelling cross-sectional healthcare utilization count data with excess zeroes and overdispersion. The motivation for these proposals was to directly capture the overall marginal effects and to avoid post-modelling effect calculations that are needed for the non-marginalized conventional two-part models. However, are marginalized two-part models superior to non-marginalized two-part models because of their structural property? Is it true that the marginalized two-part models can provide direct marginal inference? This article aims to answer these questions through a comprehensive investigation. We first summarize the existing non-marginalized and marginalized two-part models and then develop marginalized hurdle Poisson and negative binomial models for cross-sectional count data with abundant zero counts. Our interest in the investigation lies particularly in the (average) marginal effect and (average) incremental effect and the comparison of these effects. The estimators of these effects are presented, and variance estimators are derived by using delta methods and Taylor series approximations. Though the marginalized models attract attention because of the alleged convenience of direct marginal inference, we provide evidence for the impact of model misspecification of the marginalized models over the conventional models, and provide evidence for the importance of goodness-of-fit evaluation and model selection in differentiating between the marginalized and non-marginalized models. An empirical analysis of the German Socioeconomic Panel data is presented.


Journal of Statistical Computation and Simulation | 2017

Simulating comparisons of different computing algorithms fitting zero-inflated Poisson models for zero abundant counts

Xueyan Liu; Bryan Winter; Li Tang; Bo Zhang; Zhiwei Zhang; Hui Zhang

ABSTRACT Zero-inflated Poisson models are frequently used to analyse count data with excess zeroes. However, results generated by different algorithms, by various statistical packages or procedures in R and SAS, are often inconsistent, especially for small sample sizes or when the proportion of zero inflation is not large. In this study, we compared the underlying nonlinear optimization approaches and the statistical theories on which common packages and procedures are based. Then, multiple sets of simulated data of small, medium, and large sample sizes were fitted to test the performance of algorithms in available R packages and SAS procedures. They were also compared by using a real-data example. The zeroinfl function with methods CD type 1, CD type 2, and CD type 3 in the PSCL package in R and the GENMOD procedure in SAS generally outperformed in the simulation studies and produced consistent results for the real-data example.

Collaboration


Dive into the Li Tang's collaboration.

Top Co-Authors

Avatar

Zhiwei Zhang

Center for Devices and Radiological Health

View shared research outputs
Top Co-Authors

Avatar

Bo Zhang

Center for Devices and Radiological Health

View shared research outputs
Top Co-Authors

Avatar

Hui Zhang

St. Jude Children's Research Hospital

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Caroline C. King

Centers for Disease Control and Prevention

View shared research outputs
Top Co-Authors

Avatar

Yungtai Lo

Albert Einstein College of Medicine

View shared research outputs
Top Co-Authors

Avatar

Xueyan Liu

St. Jude Children's Research Hospital

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Bryan Winter

St. Jude Children's Research Hospital

View shared research outputs
Researchain Logo
Decentralizing Knowledge