Changshuai Wei
Michigan State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Changshuai Wei.
PLOS ONE | 2014
Olga A. Vsevolozhskaya; Dmitri V. Zaykin; Mark C. Greenwood; Changshuai Wei; Qing Lu
While progress has been made in identifying common genetic variants associated with human diseases, for most of common complex diseases, the identified genetic variants only account for a small proportion of heritability. Challenges remain in finding additional unknown genetic variants predisposing to complex diseases. With the advance in next-generation sequencing technologies, sequencing studies have become commonplace in genetic research. The ongoing exome-sequencing and whole-genome-sequencing studies generate a massive amount of sequencing variants and allow researchers to comprehensively investigate their role in human diseases. The discovery of new disease-associated variants can be enhanced by utilizing powerful and computationally efficient statistical methods. In this paper, we propose a functional analysis of variance (FANOVA) method for testing an association of sequence variants in a genomic region with a qualitative trait. The FANOVA has a number of advantages: (1) it tests for a joint effect of gene variants, including both common and rare; (2) it fully utilizes linkage disequilibrium and genetic position information; and (3) allows for either protective or risk-increasing causal variants. Through simulations, we show that FANOVA outperform two popularly used methods – SKAT and a previously proposed method based on functional linear models (FLM), – especially if a sample size of a study is small and/or sequence variants have low to moderate effects. We conduct an empirical study by applying three methods (FANOVA, SKAT and FLM) to sequencing data from Dallas Heart Study. While SKAT and FLM respectively detected ANGPTL 4 and ANGPTL 3 associated with obesity, FANOVA was able to identify both genes associated with obesity.
Genetic Epidemiology | 2012
Qing Lu; Changshuai Wei; Chengyin Ye; Ming Li; Robert C. Elston
The potential importance of the joint action of genes, whether modeled with or without a statistical interaction term, has long been recognized. However, identifying such action has been a great challenge, especially when millions of genetic markers are involved. We propose a likelihood ratio‐based Mann‐Whitney test to search for joint gene action either among candidate genes or genome‐wide. It extends the traditional univariate Mann‐Whitney test to assess the joint association of genotypes at multiple loci with disease, allowing for high‐order statistical interactions. Because only one overall significance test is conducted for the entire analysis, it avoids the issue of multiple testing. Moreover, the approach adopts a computationally efficient algorithm, making a genome‐wide search feasible in a reasonable amount of time on a high performance personal computer. We evaluated the approach using both theoretical and real data. By applying the approach to 40 type 2 diabetes (T2D) susceptibility single‐nucleotide polymorphisms (SNPs), we identified a four‐locus model strongly associated with T2D in the Wellcome Trust (WT) study (permutation P‐value < 0.001), and replicated the same finding in the Nurses’ Health Study/Health Professionals Follow‐Up Study (NHS/HPFS) (P‐value = 3.03×10−11 ). We also conducted a genome‐wide search on 385,598 SNPs in the WT study. The analysis took approximately 55 hr on a personal computer, identifying the same first two loci, but overall a different set of four SNPs, jointly associated with T2D (P‐value = 1.29×10−5 ). The nominal significance of this same association reached 4.01×10−6 in the NHS/HPFS. Genet. Epidemiol. 00:1‐11, 2012.
Genetic Epidemiology | 2013
Changshuai Wei; Daniel J. Schaid; Qing Lu
Common complex diseases are likely influenced by the interplay of hundreds, or even thousands, of genetic variants. Converging evidence shows that genetic variants with low marginal effects (LMEs) play an important role in disease development. Despite their potential significance, discovering LME genetic variants and assessing their joint association on high‐dimensional data (e.g., genome‐wide data) remain a great challenge. To facilitate joint association analysis among a large ensemble of LME genetic variants, we proposed a computationally efficient and powerful approach, which we call Trees Assembling Mann‐Whitney (TAMW). Through simulation studies and an empirical data application, we found that TAMW outperformed multifactor dimensionality reduction (MDR) and the likelihood ratio‐based Mann‐Whitney approach (LRMW) when the underlying complex disease involves multiple LME loci and their interactions. For instance, in a simulation with 20 interacting LME loci, TAMW attained a higher power (power = 0.931) than both MDR (power = 0.599) and LRMW (power = 0.704). In an empirical study of 29 known Crohns disease (CD) loci, TAMW also identified a stronger joint association with CD than those detected by MDR and LRMW. Finally, we applied TAMW to Wellcome Trust CD GWAS to conduct a genome‐wide analysis. The analysis of 459K single nucleotide polymorphisms was completed in 40 hrs using parallel computing, and revealed a joint association predisposing to CD (P‐value = 2.763 × 10−19). Further analysis of the newly discovered association suggested that 13 genes, such as ATG16L1 and LACC1, may play an important role in CD pathophysiological and etiological processes.
Genetic Epidemiology | 2014
Changshuai Wei; Ming Li; Zihuai He; Olga A. Vsevolozhskaya; Daniel J. Schaid; Qing Lu
With advancements in next‐generation sequencing technology, a massive amount of sequencing data is generated, which offers a great opportunity to comprehensively investigate the role of rare variants in the genetic etiology of complex diseases. Nevertheless, the high‐dimensional sequencing data poses a great challenge for statistical analysis. The association analyses based on traditional statistical methods suffer substantial power loss because of the low frequency of genetic variants and the extremely high dimensionality of the data. We developed a Weighted U Sequencing test, referred to as WU‐SEQ, for the high‐dimensional association analysis of sequencing data. Based on a nonparametric U‐statistic, WU‐SEQ makes no assumption of the underlying disease model and phenotype distribution, and can be applied to a variety of phenotypes. Through simulation studies and an empirical study, we showed that WU‐SEQ outperformed a commonly used sequence kernel association test (SKAT) method when the underlying assumptions were violated (e.g., the phenotype followed a heavy‐tailed distribution). Even when the assumptions were satisfied, WU‐SEQ still attained comparable performance to SKAT. Finally, we applied WU‐SEQ to sequencing data from the Dallas Heart Study (DHS), and detected an association between ANGPTL 4 and very low density lipoprotein cholesterol.
Genetic Epidemiology | 2014
Ming Li; Zihuai He; Min Zhang; Xiaowei Zhan; Changshuai Wei; Robert C. Elston; Qing Lu
With the advance of high‐throughput sequencing technologies, it has become feasible to investigate the influence of the entire spectrum of sequencing variations on complex human diseases. Although association studies utilizing the new sequencing technologies hold great promise to unravel novel genetic variants, especially rare genetic variants that contribute to human diseases, the statistical analysis of high‐dimensional sequencing data remains a challenge. Advanced analytical methods are in great need to facilitate high‐dimensional sequencing data analyses. In this article, we propose a generalized genetic random field (GGRF) method for association analyses of sequencing data. Like other similarity‐based methods (e.g., SIMreg and SKAT), the new method has the advantages of avoiding the need to specify thresholds for rare variants and allowing for testing multiple variants acting in different directions and magnitude of effects. The method is built on the generalized estimating equation framework and thus accommodates a variety of disease phenotypes (e.g., quantitative and binary phenotypes). Moreover, it has a nice asymptotic property, and can be applied to small‐scale sequencing data without need for small‐sample adjustment. Through simulations, we demonstrate that the proposed GGRF attains an improved or comparable power over a commonly used method, SKAT, under various disease scenarios, especially when rare variants play a significant role in disease etiology. We further illustrate GGRF with an application to a real dataset from the Dallas Heart Study. By using GGRF, we were able to detect the association of two candidate genes, ANGPTL3 and ANGPTL4, with serum triglyceride.
The Journal of Pediatrics | 2014
Changshuai Wei; Qing Lu; Sok Kean Khoo; Madeleine Lenski; Raina N. Fichorova; Alan Leviton; Nigel Paneth
We studied gene expression in 9 sets of paired newborn blood spots stored for 8-10 years in either the frozen state or the unfrozen state. Fewer genes were expressed in unfrozen spots, but the average correlation coefficient for overall gene expression comparing the frozen and unfrozen state was 0.771 (95% CI, 0.700-0.828).
Journal of Biopharmaceutical Statistics | 2010
Qing Lu; Yuehua Cui; Chengyin Ye; Changshuai Wei; Robert C. Elston
Translation studies have been initiated to assess the combined effect of genetic loci from recently accomplished genome-wide association studies and the existing risk factors for early disease prediction. We propose a bagging optimal receiver operating characteristic (ROC) curve method to facilitate this research. Through simulation and real data application, we compared the new method with the commonly used allele counting method and logistic regression, and found that the new method yields a better performance. The new method was applied on the Wellcome Trust data set to form a predictive genetic test for rheumatoid arthritis. The formed test reached an area under the curve (AUC) value of 0.7.
Annals of Human Genetics | 2016
Ming Li; Jingyun Li; Changshuai Wei; Qing Lu; Xinyu Tang; Stephen W. Erickson; Stewart L. MacLeod; Charlotte A. Hobbs
Congenital heart defects (CHDs) develop through a complex interplay between genetic variants, epigenetic modifications, and maternal environmental exposures. Genetic studies of CHDs have commonly tested single genetic variants for association with CHDs. Less attention has been given to complex gene‐by‐gene and gene‐by‐environment interactions. In this study, we applied a recently developed likelihood‐ratio Mann‐Whitney (LRMW) method to detect joint actions among maternal variants, fetal variants, and maternal environmental exposures, allowing for high‐order statistical interactions. All subjects are participants from the National Birth Defect Prevention Study, including 623 mother‐offspring pairs with CHD‐affected pregnancies and 875 mother‐offspring pairs with unaffected pregnancies. Each individual has 872 single nucleotide polymorphisms encoding for critical enzymes in the homocysteine, folate, and trans‐sulfuration pathways. By using the LRMW method, three variants (fetal rs625879, maternal rs2169650, and maternal rs8177441) were identified with a joint association to CHD risk (nominal P‐value = 1.13e‐07). These three variants are located within genes BHMT2, GSTP1, and GPX3, respectively. Further examination indicated that maternal SNP rs2169650 may interact with both fetal SNP rs625879 and maternal SNP rs8177441. Our findings suggest that the risk of CHD may be influenced by both the intragenerational interaction within the maternal genome and the intergenerational interaction between maternal and fetal genomes.
Journal of Maternal-fetal & Neonatal Medicine | 2013
Jaime Slaughter; Changshuai Wei; Steven J. Korzeniewski; Qing Lu; John S. Beck; Sok Kean Khoo; Ariel Brovont; Joel Maurer; Denny R. Martin; Madeleine Lenski; Nigel Paneth
Abstract Objective: To examine the correlation in genes expressed in paired umbilical cord blood (UCB) and newborn blood (NB). Method: Total mRNA and mRNA of three gene sets (inflammatory, hypoxia, and thyroidal response) was assessed using microarray in UCB and NB spotted on Guthrie cards from 7 mother/infant pairs. Results: The average gene expression correlation between paired UCB and NB samples was 0.941 when all expressed genes were considered, and 0.949 for three selected gene sets. Conclusion: The high correlation of UCB and NB gene expression suggest that either source may be useful for examining gene expression in the perinatal period.
Frontiers in Genetics | 2012
Changshuai Wei; James C. Anthony; Qing Lu
Cocaine-associated biomedical and psychosocial problems are substantial twenty-first century global burdens of disease. This burden is largely driven by a cocaine dependence process that becomes engaged with increasing occasions of cocaine product use. For this reason, the development of a risk-prediction model for cocaine dependence may be of special value. Ultimately, success in building such a risk-prediction model may help promote personalized cocaine dependence prediction, prevention, and treatment approaches not presently available. As an initial step toward this goal, we conducted a genome-environmental risk-prediction study for cocaine dependence, simultaneously considering 948,658 single nucleotide polymorphisms (SNPs), six potentially cocaine-related facets of environment, and three personal characteristics. In this study, a novel statistical approach was applied to 1045 case-control samples from the Family Study of Cocaine Dependence. The results identify 330 low- to medium-effect size SNPs (i.e., those with a single-locus p-value of less than 10−4) that made a substantial contribution to cocaine dependence risk prediction (AUC = 0.718). Inclusion of six facets of environment and three personal characteristics yielded greater accuracy (AUC = 0.809). Of special importance was the joint effect of childhood abuse (CA) among trauma experiences and the GBE1 gene in cocaine dependence risk prediction. Genome-environmental risk-prediction models may become more promising in future risk-prediction research, once a more substantial array of environmental facets are taken into account, sometimes with model improvement when gene-by-environment product terms are included as part of these risk predication models.