Donglin Zeng
University of North Carolina at Chapel Hill
                                 Network
                            
                            Latest external collaboration on country level. Dive into details by clicking on the dots.
                                 Publication
                            
                            Featured researches published by Donglin Zeng.
Journal of the American Statistical Association | 2012
Yingqi Zhao; Donglin Zeng; A. John Rush; Michael R. Kosorok
There is increasing interest in discovering individualized treatment rules (ITRs) for patients who have heterogeneous responses to treatment. In particular, one aims to find an optimal ITR that is a deterministic function of patient-specific characteristics maximizing expected clinical outcome. In this article, we first show that estimating such an optimal treatment rule is equivalent to a classification problem where each subject is weighted proportional to his or her clinical outcome. We then propose an outcome weighted learning approach based on the support vector machine framework. We show that the resulting estimator of the treatment rule is consistent. We further obtain a finite sample bound for the difference between the expected outcome using the estimated ITR and that of the optimal treatment rule. The performance of the proposed approach is demonstrated via simulation studies and an analysis of chronic depression data.
European Journal of Epidemiology | 2007
Ricardo A. Pollitt; Jay S. Kaufman; Kathryn M. Rose; Ana V. Diez-Roux; Donglin Zeng; Gerardo Heiss
Background: Associations between childhood and adult socioeconomic status (SES) and adult levels of inflammatory markers (C-reactive protein [CRP], fibrinogen, white blood cell count [WBC], and von Willebrand factor [vWF]) were examined in the Atherosclerosis Risk in Communities (ARIC) Study cohort. Methods: A total of 12,681 white and African-American participants provided information on SES (via education and social class) and place of residence in childhood and adulthood. Residences were linked to census data for neighborhood SES information. Multiple imputation was used to impute missing data. Hierarchical and linear regression were used to estimate the effects of SES and possible mediation by adult cardiovascular disease (CVD) risk factors. Findings: Low childhood social class and education were associated with elevated levels of CRP, fibrinogen, WBC, and vWF (increments of 17%, 2%, 4% and 3% for lowest versus highest education in childhood, respectively) among whites. Findings were less consistent among African-Americans. Adult SES was more strongly associated with inflammation than childhood SES. Individual-level SES measures were more consistently associated with inflammation than neighborhood-level measures. Fibrinogen and WBC showed the most consistent associations with SES; the largest changes in inflammation by SES were observed for CRP. Covariate adjustment strongly attenuated these associations. Mediation of the SES-inflammation associations by BMI, smoking and HDL cholesterol (HDL-C) are suggested by these data. Conclusion: Low individual- and neighborhood-level SES in childhood and adulthood are associated with modest increments in adult inflammatory burden. These associations may operate through the influence of low SES on traditional CVD risk factors, especially BMI, smoking and HDL-C.
Journal of the American Statistical Association | 2006
D. Y. Lin; Donglin Zeng
A haplotype is a specific sequence of nucleotides on a single chromosome. The population associations between haplotypes and disease phenotypes provide critical information about the genetic basis of complex human diseases. Standard genotyping techniques cannot distinguish the two homologous chromosomes of an individual, so only the unphased genotype (i.e., the combination of the two homologous haplotypes) is directly observable. Statistical inference about haplotype–phenotype associations based on unphased genotype data presents an intriguing missing-data problem, especially when the sampling depends on the disease status. The objective of this article is to provide a systematic and rigorous treatment of this problem. All commonly used study designs, including cross-sectional, case-control, and cohort studies, are considered. The phenotype can be a disease indicator, a quantitative trait, or a potentially censored time-to-disease variable. The effects of haplotypes on the phenotype are formulated through flexible regression models, which can accommodate various genetic mechanisms and gene–environment interactions. Appropriate likelihoods are constructed that may involve high-dimensional parameters. The identifiability of the parameters and the consistency, asymptotic normality, and efficiency of the maximum likelihood estimators are established. Efficient and reliable numerical algorithms are developed. Simulation studies show that the likelihood-based procedures perform well in practical settings. An application to the Finland–United States Investigation of NIDDM Genetics Study is provided. Areas in need of further development are discussed.
Genetic Epidemiology | 2009
D. Y. Lin; Donglin Zeng
To identify genetic variants with modest effects on complex human diseases, a growing number of networks or consortia are created for sharing data from multiple genome‐wide association studies on the same disease or related disorders. A central question in this enterprise is whether to obtain summary results or individual participant data from relevant studies. We show theoretically and numerically that meta‐analysis of summary results is statistically as efficient as joint analysis of individual participant data (provided that both analyses are performed properly under the same modeling assumptions). We illustrate this equivalence with case‐control data from the Finland‐United States Investigation of NIDDM Genetics (FUSION) study. Collating only summary results will increase the number and representativeness of available studies, simplify data collection and analysis, reduce resource utilization, and accelerate discovery. Genet. Epidemiol. 34:60–66, 2010.
Journal of the American Statistical Association | 2008
Brent A. Johnson; D. Y. Lin; Donglin Zeng
We propose a general strategy for variable selection in semiparametric regression models by penalizing appropriate estimating functions. Important applications include semiparametric linear regression with censored responses and semiparametric regression with missing predictors. Unlike the existing penalized maximum likelihood estimators, the proposed penalized estimating functions may not pertain to the derivatives of any objective functions and may be discrete in the regression coefficients. We establish a general asymptotic theory for penalized estimating functions and present suitable numerical algorithms to implement the proposed estimators. In addition, we develop a resampling technique to estimate the variances of the estimated regression coefficients when the asymptotic variances cannot be evaluated directly. Simulation studies demonstrate that the proposed methods perform well in variable selection and variance estimation. We illustrate our methods using data from the Paul Coverdell Stroke Registry.
Genetic Epidemiology | 2009
D. Y. Lin; Donglin Zeng
Case‐control association studies often collect extensive information on secondary phenotypes, which are quantitative or qualitative traits other than the case‐control status. Exploring secondary phenotypes can yield valuable insights into biological pathways and identify genetic variants influencing phenotypes of direct interest. All publications on secondary phenotypes have used standard statistical methods, such as least‐squares regression for quantitative traits. Because of unequal selection probabilities between cases and controls, the case‐control sample is not a random sample from the general population. As a result, standard statistical analysis of secondary phenotype data can be extremely misleading. Although one may avoid the sampling bias by analyzing cases and controls separately or by including the case‐control status as a covariate in the model, the associations between a secondary phenotype and a genetic variant in the case and control groups can be quite different from the association in the general population. In this article, we present novel statistical methods that properly reflect the case‐control sampling in the analysis of secondary phenotype data. The new methods provide unbiased estimation of genetic effects and accurate control of false‐positive rates while maximizing statistical power. We demonstrate the pitfalls of the standard methods and the advantages of the new methods both analytically and numerically. The relevant software is available at our website. Genet. Epidemiol. 2009.
American Journal of Neuroradiology | 2009
Elizabeth Bullitt; F. N. Rahman; J. K. Smith; E. Kim; Donglin Zeng; Laurence M. Katz; Bonita L. Marks
BACKGROUND AND PURPOSE: Prior studies suggest that aerobic exercise may reduce both the brain atrophy and the decline in fractional anisotropy observed with advancing age. It is reasonable to hypothesize that exercise-induced changes to the vasculature may underlie these anatomic differences. The purpose of this blinded study was to compare high-activity and low-activity healthy elderly volunteers for differences in the cerebrovasculature as calculated from vessels extracted from noninvasive MR angiograms (MRAs). MATERIALS AND METHODS: Fourteen healthy elderly subjects underwent MRA. Seven subjects reported a high level of aerobic activity (64 ± 5 years of age; 5 men, 2 women) and 7, a low activity level (68 ± 6 years of age; 5 women, 2 men). Following vessel segmentation from MRA by an individual blinded to subject activity level, quantitative measures of vessel number, radius, and tortuosity were calculated and histogram analysis of vessel number and radius was performed. RESULTS: Aerobically active subjects exhibited statistically significant reductions in vessel tortuosity and an increased number of small vessels compared with less active subjects. CONCLUSIONS: Aerobic activity in elderly subjects is associated with lower vessel tortuosity values and an increase in the number of small-caliber vessels. It is possible that an aerobic exercise program may contribute to healthy brain aging. MRA offers a noninvasive approach to visualizing the cerebral vasculature and may prove useful in future longitudinal investigations.
Journal of the American Statistical Association | 2006
Donglin Zeng; Guosheng Yin; Joseph G. Ibrahim
We propose a class of transformation models for survival data with a cure fraction. The class of transformation models is motivated by biological considerations and includes both the proportional hazards and the proportional odds cure models as two special cases. An efficient recursive algorithm is proposed to calculate the maximum likelihood estimators (MLEs). Furthermore, the MLEs for the regression coefficients are shown to be consistent and asymptotically normal, and their asymptotic variances attain the semiparametric efficiency bound. Simulation studies are conducted to examine the finite-sample properties of the proposed estimators. The method is illustrated on data from a clinical trial involving the treatment of melanoma.
Journal of the American Statistical Association | 2007
Donglin Zeng; D. Y. Lin
The accelerated failure time model provides a natural formulation of the effects of covariates on potentially censored response variable. The existing semiparametric estimators are computationally intractable and statistically inefficient. In this article we propose an approximate nonparametric maximum likelihood method for the accelerated failure time model with possibly time-dependent covariates. We estimate the regression parameters by maximizing a kernel-smoothed profile likelihood function. The maximization can be achieved through conventional gradient-based search algorithms. The resulting estimators are consistent and asymptotically normal. The limiting covariance matrix attains the semiparametric efficiency bound and can be consistently estimated. We also provide a consistent estimator for the error distribution. Extensive simulation studies demonstrate that the asymptotic approximations are accurate in practical situations and the new estimators are considerably more efficient than the existing ones. Illustrations with clinical and epidemiologic studies are provided.
Journal of the American Statistical Association | 2015
Yingqi Zhao; Donglin Zeng; Eric B. Laber; Michael R. Kosorok
Dynamic treatment regimes (DTRs) are sequential decision rules for individual patients that can adapt over time to an evolving illness. The goal is to accommodate heterogeneity among patients and find the DTR which will produce the best long-term outcome if implemented. We introduce two new statistical learning methods for estimating the optimal DTR, termed backward outcome weighted learning (BOWL), and simultaneous outcome weighted learning (SOWL). These approaches convert individualized treatment selection into an either sequential or simultaneous classification problem, and can thus be applied by modifying existing machine learning techniques. The proposed methods are based on directly maximizing over all DTRs a nonparametric estimator of the expected long-term outcome; this is fundamentally different than regression-based methods, for example, Q-learning, which indirectly attempt such maximization and rely heavily on the correctness of postulated regression models. We prove that the resulting rules are consistent, and provide finite sample bounds for the errors using the estimated rules. Simulation results suggest the proposed methods produce superior DTRs compared with Q-learning especially in small samples. We illustrate the methods using data from a clinical trial for smoking cessation. Supplementary materials for this article are available online.
