Hua Yun Chen
University of Illinois at Chicago
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hua Yun Chen.
Journal of the American Statistical Association | 1999
Hua Yun Chen; Roderick J. A. Little
Abstract Nonparametric maximum likelihood (NPML) is used to estimate regression parameters in a proportional hazards regression model with missing covariates. The NPML estimator is shown to be consistent and asymptotically normally distributed under some conditions. EM type algorithms are applied to solve the maximization problem. Variance estimates of the regression parameters are obtained by a profile likelihood approach that uses EM-aided numerical differentiation. Simulation results indicate that the NPML estimates of the regression parameters are more efficient than the approximate partial likelihood estimates and estimates from complete-case analysis when missing covariates are missing completely at random, and that the proposed method corrects for bias when the missing covariates are missing at random.
Journal of the American Statistical Association | 2004
Hua Yun Chen
Robustness of covariate modeling for the missing-covariate problem in parametric regression is studied under the missing-at-random assumption. For a simple missing-covariate pattern, nonparametric covariate model is proposed and is shown to yield a consistent and semiparametrically efficient estimator for the regression parameter. Total robustness is achieved in this situation. For more general missingcovariate patterns, a novel semiparametric modeling approach is proposed for the covariates. In this approach, the covariate distribution is first decomposed into the product of a series of conditional distributions according to the overall missing-data patterns, and the conditional distributions are then represented in the general odds ratio form. The general odds ratios are modeled parametrically, and the other components of the covariate distribution are modeled nonparametrically. Maximum semiparametric likelihood is used to find the parameter estimates. The proposed method yields a consistent estimator for the regression parameter when the odds ratios are modeled correctly. In general, the semiparametric covariate modeling strategy increases the robustness against covariate model misspecification when compared with the parametric modeling strategy proposed by Lipsitz and Ibrahim. The new covariate modeling approach can also be incorporated into the doubly robust procedure of Robins et al. to increase protection against misspecification of the missing-data mechanism. In addition, the proposed modeling strategy avoids the usually intractable integrations involved in the maximization of the incomplete-data likelihood with parametric covariate models. The proposed method can be applied to many regression models to handle incomplete covariates.
Human Genetics | 2014
Ken Batai; Adam B. Murphy; Ebony Shah; Maria Ruden; Jennifer Newsome; Sara Agate; Michael A. Dixon; Hua Yun Chen; Leslie A. Deane; Courtney M.P. Hollowell; Chiledum Ahaghotu; Rick A. Kittles
AbstractnVitamin D deficiency is more common among African Americans (AAs) than among European Americans (EAs), and epidemiologic evidence links vitamin D status to many health outcomes. Two genome-wide association studies (GWAS) in European populations identified vitamin D pathway gene single-nucleotide polymorphisms (SNPs) associated with serum vitamin D [25(OH)D] levels, but a few of these SNPs have been replicated in AAs. Here, we investigated the associations of 39 SNPs in vitamin D pathway genes, including 19 GWAS-identified SNPs, with serum 25(OH)D concentrations in 652 AAs and 405 EAs. Linear and logistic regression analyses were performed adjusting for relevant environmental and biological factors. The pattern of SNP associations was distinct between AAs and EAs. In AAs, six GWAS-identified SNPs in GC, CYP2R1, and DHCR7/NADSYN1 were replicated, while nine GWAS SNPs in GC and CYP2R1 were replicated in EAs. A CYP2R1 SNP, rs12794714, exhibited the strongest signal of association in AAs. In EAs, however, a different CYP2R1 SNP, rs1993116, was the most strongly associated. Our models, which take into account genetic and environmental variables, accounted for 20 and 28xa0% of the variance in serum vitamin D levels in AAs and EAs, respectively.
Journal of The Royal Statistical Society Series B-statistical Methodology | 2003
Hua Yun Chen
Two likelihood representations corresponding to the prospective and retrospective analyses of the case-control design are derived for general outcome-dependent samples with arbitrary discrete or continuous outcomes and possibly non-multiplicative models. Parameter identification in the general outcome-dependent design is reduced to the simple problem of parameter identification in the general odds ratio function. Both likelihoods are shown to generate the same profile likelihood for the common parameter of interest. Maximum like- lihood estimators based on either likelihood are semiparametric efficient for the identifiable parameters. Copyright 2003 Royal Statistical Society.
Statistics in Medicine | 2013
Hua Yun Chen; Rick A. Kittles; Wei Zhang
In genetic association studies with densely typed genetic markers, it is often of substantial interest to examine not only the primary phenotype but also the secondary traits for their association with the genetic markers. For more efficient sample ascertainment of the primary phenotype, a case-control design or its variants,u2009suchu2009as the extreme-value sampling design for a quantitative trait, are often adopted. The secondaryu2009trait analysis without correcting for the sample ascertainment may yield a biased association estimator. We propose a new method aiming at correcting the potential bias due to the inadequate adjustment of the sample ascertainment. The method yields explicit correction formulas that can be used to both screen the genetic markers and rapidly evaluate the sensitivity of the results to the assumed baseline case-prevalence rate in the population. Simulation studies demonstrate good performance of the proposed approach in comparison with the more computationally intensive approaches, such as the compensator approaches and the maximum prospective likelihood approach. We illustrate the application of the approach by analysis of the genetic association of prostate specific antigen in a case-control study of prostate cancer in the African American population.
Biometrics | 2011
Hua Yun Chen; Hui Xie; Yi Qian
Multiple imputation is a practically useful approach to handling incompletely observed data in statistical analysis. Parameter estimation and inference based on imputed full data have been made easy by Rubins rule for result combination. However, creating proper imputation that accommodates flexible models for statistical analysis in practice can be very challenging. We propose an imputation framework that uses conditional semiparametric odds ratio models to impute the missing values. The proposed imputation framework is more flexible and robust than the imputation approach based on the normal model. It is a compatible framework in comparison to the approach based on fully conditionally specified models. The proposed algorithms for multiple imputation through the Markov chain Monte Carlo sampling approach can be straightforwardly carried out. Simulation studies demonstrate that the proposed approach performs better than existing, commonly used imputation approaches. The proposed approach is applied to imputing missing values in bone fracture data.
Lifetime Data Analysis | 2001
Hua Yun Chen; Roderick J. A. Little
We propose a profile conditional likelihood approach to handle missing covariates in the general semiparametric transformation regression model. The method estimates the marginal survival function by the Kaplan-Meier estimator, and then estimates the parameters of the survival model and the covariate distribution from a conditional likelihood, substituting the Kaplan-Meier estimator for the marginal survival function in the conditional likelihood. This method is simpler than full maximum likelihood approaches, and yields consistent and asymptotically normally distributed estimator of the regression parameter when censoring is independent of the covariates. The estimator demonstrates very high relative efficiency in simulations. When compared with complete-case analysis, the proposed estimator can be more efficient when the missing data are missing completely at random and can correct bias when the missing data are missing at random. The potential application of the proposed method to the generalized probit model with missing continuous covariates is also outlined.
Statistics in Medicine | 2009
Hua Yun Chen; Shasha Gao
We study the problem of estimation and inference on the average treatment effect in a smoking cessation trial where an outcome and some auxiliary information were measured longitudinally, and both were subject to missing values. Dynamic generalized linear mixed effects models linking the outcome, the auxiliary information, and the covariates are proposed. The maximum likelihood approach is applied to the estimation and inference on the model parameters. The average treatment effect is estimated by the G-computation approach, and the sensitivity of the treatment effect estimate to the nonignorable missing data mechanisms is investigated through the local sensitivity analysis approach. The proposed approach can handle missing data that form arbitrary missing patterns over time. We applied the proposed method to the analysis of the smoking cessation trial.
Statistics in Medicine | 2013
Hua Yun Chen; Muredach P. Reilly; Mingyao Li
We propose a semiparametric odds ratio model that extends Umbach and Weinbergs approach to exploiting gene-environment association model for efficiency gains in case-control designs to both discrete and continuous data. We directly model the gene-environment association in the control population to avoid estimating the intercept in the disease risk model, which is inherently difficult because of the scarcity of information on the parameter with the sampling designs. We propose a novel permutation-based approach to eliminate the high-dimensional nuisance parameters in the matched case-control design. The proposed approach reduces to the conditional logistic regression when the model for the gene-environment association is unrestricted. Simulation studies demonstrate good performance of the proposed approach. We apply the proposed approach to a study of gene-environment interaction on coronary artery disease.
Journal of the American Statistical Association | 2015
Hua Yun Chen; Daniel E. Rader; Mingyao Li
A flexible semiparametric odds ratio model has been proposed to unify and to extend both the log-linear model and the joint normal model for data with a mix of discrete and continuous variables. The semiparametric odds ratio model is particularly useful for analyzing biased sampling designs. However, statistical inference of the model has not been systematically studied when more than one nonparametric component is involved in the model. In this article, we study the maximum semiparametric likelihood approach to estimation and inference of the semiparametric odds ratio model. We show that the maximum semiparametric likelihood estimator of the odds ratio parameter is consistent and asymptotically normally distributed. We also establish statistical inference under a misspecified semiparametric odds ratio model, which is important when handling weak identifiability in conditionally specified models under biased sampling designs. We use simulation studies to demonstrate that the proposed approaches have satisfactory finite sample performance. Finally, we illustrate the proposed approach by analyzing multiple traits in a genome-wide association study of high-density lipid protein. Supplementary materials for this article are available online.