Wenjiang J. Fu
Michigan State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Wenjiang J. Fu.
Genetic Epidemiology | 2011
Ming Li; Chengyin Ye; Wenjiang J. Fu; Robert C. Elston; Qing Lu
The genetic etiology of complex human diseases has been commonly viewed as a process that involves multiple genetic variants, environmental factors, as well as their interactions. Statistical approaches, such as the multifactor dimensionality reduction (MDR) and generalized MDR (GMDR), have recently been proposed to test the joint association of multiple genetic variants with either dichotomous or continuous traits. In this study, we propose a novel Forward U‐Test to evaluate the combined effect of multiple loci on quantitative traits with consideration of gene‐gene/gene‐environment interactions. In this new approach, a U‐Statistic‐based forward algorithm is first used to select potential disease‐susceptibility loci and then a weighted U‐statistic is used to test the joint association of the selected loci with the disease. Through a simulation study, we found the Forward U‐Test outperformed GMDR in terms of greater power. Aside from that, our approach is less computationally intensive, making it feasible for high‐dimensional gene‐gene/gene‐environment research. We illustrate our method with a real data application to nicotine dependence (ND), using three independent datasets from the Study of Addiction: Genetics and Environment. Our gene‐gene interaction analysis of 155 SNPs in 67 candidate genes identified two SNPs, rs16969968 within gene CHRNA5 and rs1122530 within gene NTRK2, jointly associated with the level of ND (P‐value = 5.31e−7). The association, which involves essential interaction, is replicated in two independent datasets with P‐values of 1.08e−5 and 0.02, respectively. Our finding suggests that joint action may exist between the two gene products. Genet. Epidemiol. 2011.
Nucleic Acids Research | 2009
Lin Wan; Kelian Sun; Qi Ding; Yuehua Cui; Ming Li; Yalu Wen; Robert C. Elston; Minping Qian; Wenjiang J. Fu
Affymetrix SNP arrays have been widely used for single-nucleotide polymorphism (SNP) genotype calling and DNA copy number variation inference. Although numerous methods have achieved high accuracy in these fields, most studies have paid little attention to the modeling of hybridization of probes to off-target allele sequences, which can affect the accuracy greatly. In this study, we address this issue and demonstrate that hybridization with mismatch nucleotides (HWMMN) occurs in all SNP probe-sets and has a critical effect on the estimation of allelic concentrations (ACs). We study sequence binding through binding free energy and then binding affinity, and develop a probe intensity composite representation (PICR) model. The PICR model allows the estimation of ACs at a given SNP through statistical regression. Furthermore, we demonstrate with cell-line data of known true copy numbers that the PICR model can achieve reasonable accuracy in copy number estimation at a single SNP locus, by using the ratio of the estimated AC of each sample to that of the reference sample, and can reveal subtle genotype structure of SNPs at abnormal loci. We also demonstrate with HapMap data that the PICR model yields accurate SNP genotype calls consistently across samples, laboratories and even across array platforms.
BioMed Research International | 2010
Janet S Sinsheimer; Robert C. Elston; Wenjiang J. Fu
Hindawi Publishing Corporation Journal of Biomedicine and Biotechnology Volume 2010, Article ID 853612, 4 pages doi:10.1155/2010/853612 Editorial Gene-Gene Interaction in Maternal and Perinatal Research Janet S. Sinsheimer, 1, 2 Robert C. Elston, 3 and Wenjiang J. Fu 4 1 Departments of Biomathematics and Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA 90095, USA of Biostatistics, School of Public Health, UCLA, Los Angeles, CA 90095, USA 3 Division of Genetic and Molecular Epidemiology, Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH 44106, USA 4 The Computational Genomics Lab, Department of Epidemiology, Michigan State University, East Lansing, MI 48824, USA 2 Department Correspondence should be addressed to Wenjiang J. Fu, [email protected] Received 23 April 2010; Accepted 27 April 2010 Copyright
BMC Proceedings | 2011
Ming Li; Wenjiang J. Fu; Qing Lu
We propose a novel aggregating U-test for gene-based association analysis. The method considers both rare and common variants. It adaptively searches for potential disease-susceptibility rare variants and collapses them into a single “supervariant.” A forward U-test is then used to assess the joint association of the supervariant and other common variants with quantitative traits. Using 200 simulated replicates from the Genetic Analysis Workshop 17 mini-exome data, we compare the performance of the proposed method with that of a commonly used approach, QuTie. We find that our method has an equivalent or greater power than QuTie to detect nine genes that influence the quantitative trait Q1. This new approach provides a powerful tool for detecting both common and rare variants associated with quantitative traits.
PLOS ONE | 2013
Ming Li; Yalu Wen; Qing Lu; Wenjiang J. Fu
Oligonucleotide microarrays are commonly adopted for detecting and qualifying the abundance of molecules in biological samples. Analysis of microarray data starts with recording and interpreting hybridization signals from CEL images. However, many CEL images may be blemished by noises from various sources, observed as “bright spots”, “dark clouds”, and “shadowy circles”, etc. It is crucial that these image defects are correctly identified and properly processed. Existing approaches mainly focus on detecting defect areas and removing affected intensities. In this article, we propose to use a mixed effect model for imputing the affected intensities. The proposed imputation procedure is a single-array-based approach which does not require any biological replicate or between-array normalization. We further examine its performance by using Affymetrix high-density SNP arrays. The results show that this imputation procedure significantly reduces genotyping error rates. We also discuss the necessary adjustments for its potential extension to other oligonucleotide microarrays, such as gene expression profiling. The R source code for the implementation of approach is freely available upon request.
Journal of Computational Biology | 2013
Yalu Wen; Ming Li; Wenjiang J. Fu
The genomic wave has been identified as a major artifact in genome data and is highly correlated with the sequence GC content. Although statistical methods have been developed to filter this artifact, the mechanism underlying the genomic wave has not been studied yet. Understanding of the artifact, specifically the sources of the artifact, may lead to successful separation of biological signals from the artifact and improve array design, modeling, and association studies. We develop an approach to catching the genomic wave in the oligonucleotide single-nucleotide polymorphism (SNP) arrays by separating biological signals from the array baseline background through modeling sequence binding with a newly developed probe intensity composite representation (PICR) model. The PICR model decomposes the probe intensity of each SNP probe set into the target sequence concentrations, SNP-specific background (nonsignal) and measurement error, and identifies the biological signals through the target concentration for each allele. We demonstrate with the Affymetrix GeneChip 500K HapMap data and the Wellcome Trust Case-Control Study data that the genomic wave is captured through the SNP-specific background term of the PICR model, and is separated successfully from the allelic target concentrations-the biological signals. We further identify two important sources of the genomic waves, the GC content and the fragment length (FL) of the sequence, and conclude that (1) the genomic wave artifact can be removed from the genome data with the PICR model, and (2) in addition to the GC content, the genomic wave also has a component of nonlinear effect of the FL.
Cancer Informatics | 2014
Ming Li; Yalu Wen; Wenjiang J. Fu
Cumulative evidence has shown that structural variations, due to insertions, deletions, and inversions of DNA, may contribute considerably to the development of complex human diseases, such as breast cancer. High-throughput genotyping technologies, such as Affymetrix high density single-nucleotide polymorphism (SNP) arrays, have produced large amounts of genetic data for genome-wide SNP genotype calling and copy number estimation. Meanwhile, there is a great need for accurate and efficient statistical methods to detect copy number variants. In this article, we introduce a hidden-Markov-model (HMM)-based method, referred to as the PICR-CNV, for copy number inference. The proposed method first estimates copy number abundance for each single SNP on a single array based on the raw fluorescence values, and then standardizes the estimated copy number abundance to achieve equal footing among multiple arrays. This method requires no between-array normalization, and thus, maintains data integrity and independence of samples among individual subjects. In addition to our efforts to apply new statistical technology to raw fluorescence values, the HMM has been applied to the standardized copy number abundance in order to reduce experimental noise. Through simulations, we show our refined method is able to infer copy number variants accurately. Application of the proposed method to a breast cancer dataset helps to identify genomic regions significantly associated with the disease.
Statistical Applications in Genetics and Molecular Biology | 2011
Yalu Wen; Ming Li; Wenjiang J. Fu
Genome-wide association studies hold great promise in identifying disease-susceptibility variants and understanding the genetic etiology of complex diseases. Microarray technology enables the genotyping of millions of single nucleotide polymorphisms. Many factors in microarray studies, such as probe selection, sample quality, and experimental process and batch, have substantial effect on the genotype calling accuracy, which is crucial for downstream analyses. Failure to account for the variability of these sources may lead to inaccurate genotype calls and false positive and false negative findings. In this study, we develop a SNP-specific genotype calling algorithm based on the probe intensity composite representation (PICR) model, while using a normal mixture model to account for the variability of batch effect on the genotype calls. We demonstrate our method with SNP array data in a few studies, including the HapMap project, the coronary heart disease and the UK Blood Service Control studies by the Wellcome Trust Case-Control Consortium, and a methylation profiling study. Our single array based approach outperforms PICR and is comparable to the best multi-array genotype calling methods.
Archive | 2010
Wenjiang J. Fu; Ming Li; Yalu Wen; Likit Preeyanon
Archive | 2010
Janet S Sinsheimer; Robert C. Elston; Wenjiang J. Fu