Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Zhijin Wu is active.

Publication


Featured researches published by Zhijin Wu.


Journal of the American Statistical Association | 2004

A Model Based Background Adjustment for Oligonucleotide Expression Arrays

Zhijin Wu; Rafael A. Irizarry; Robert Gentleman; Francisco Martinez-Murillo; Forrest Spencer

High-density oligonucleotide expression arrays are widely used in many areas of biomedical research. Affymetrix GeneChip arrays are the most popular. In the Affymetrix system, a fair amount of further preprocessing and data reduction occurs after the image-processing step. Statistical procedures developed by academic groups have been successful in improving the default algorithms provided by the Affymetrix system. In this article we present a solution to one of the preprocessing steps—background adjustment—based on a formal statistical framework. Our solution greatly improves the performance of the technology in various practical applications. These arrays use short oligonucleotides to probe for genes in an RNA sample. Typically, each gene is represented by 11–20 pairs of oligonucleotide probes. The first component of these pairs is referred to as a perfect match probe and is designed to hybridize only with transcripts from the intended gene (i. e., specific hybridization). However, hybridization by other sequences (i. e., nonspecific hybridization) is unavoidable. Furthermore, hybridization strengths are measured by a scanner that introduces optical noise. Therefore, the observed intensities need to be adjusted to give accurate measurements of specific hybridization. We have found that the default ad hoc adjustment, provided as part of the Affymetrix system, can be improved through the use of estimators derived from a statistical model that uses probe sequence information. A final step in preprocessing is to summarize the probe-level data for each gene to define a measure of expression that represents the amount of the corresponding mRNA species. In this article we illustrate the practical consequences of not adjusting appropriately for the presence of nonspecific hybridization and provide a solution based on our background adjustment procedure. Software that computes our adjustment is available as part of the Bioconductor Project (http://www.bioconductor.org).


Nature Genetics | 2009

The human colon cancer methylome shows similar hypo- and hypermethylation at conserved tissue-specific CpG island shores

Rafael A. Irizarry; Christine Ladd-Acosta; Bo Wen; Zhijin Wu; Carolina Montano; Patrick Onyango; Hengmi Cui; Kevin Gabo; Michael Rongione; Maree J. Webster; Hong-Fei Ji; James B. Potash; Sarven Sabunciyan; Andrew P. Feinberg

Alterations in DNA methylation (DNAm) in cancer have been known for 25 years, including hypomethylation of oncogenes and hypermethylation of tumor suppressor genes1. However, most studies of cancer methylation have assumed that functionally important DNAm will occur in promoters, and that most DNAm changes in cancer occur in CpG islands2,3. Here we show that most methylation alterations in colon cancer occur not in promoters, and also not in CpG islands but in sequences up to 2 kb distant which we term “CpG island shores.” CpG island shore methylation was strongly related to gene expression, and it was highly conserved in mouse, discriminating tissue types regardless of species of origin. There was a surprising overlap (45-65%) of the location of colon cancer-related methylation changes with those that distinguished normal tissues, with hypermethylation enriched closer to the associated CpG islands, and hypomethylation enriched further from the associated CpG island and resembling non-colon normal tissues. Thus, methylation changes in cancer are at sites that vary normally in tissue differentiation, and they are consistent with the epigenetic progenitor model of cancer4, that epigenetic alterations affecting tissue-specific differentiation are the predominant mechanism by which epigenetic changes cause cancer.


Bioinformatics | 2006

Comparison of Affymetrix GeneChip expression measures

Rafael A. Irizarry; Zhijin Wu; Harris A. Jaffee

MOTIVATION In the Affymetrix GeneChip system, preprocessing occurs before one obtains expression level measurements. Because the number of competing preprocessing methods was large and growing we developed a benchmark to help users identify the best method for their application. A webtool was made available for developers to benchmark their procedures. At the time of writing over 50 methods had been submitted. RESULTS We benchmarked 31 probe set algorithms using a U95A dataset of spike in controls. Using this dataset, we found that background correction, one of the main steps in preprocessing, has the largest effect on performance. In particular, background correction appears to improve accuracy but, in general, worsen precision. The benchmark results put this balance in perspective. Furthermore, we have improved some of the original benchmark metrics to provide more detailed information regarding precision and accuracy. A handful of methods stand out as providing the best balance using spike-in data with the older U95A array, although different experiments on more current arrays may benchmark differently. AVAILABILITY The affycomp package, now version 1.5.2, continues to be available as part of the Bioconductor project (http://www.bioconductor.org). The webtool continues to be available at http://affycomp.biostat.jhsph.edu CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.


Bioinformatics | 2004

A benchmark for Affymetrix GeneChip expression measures

Leslie Cope; Rafael A. Irizarry; Harris A. Jaffee; Zhijin Wu; Terence P. Speed

MOTIVATION The defining feature of oligonucleotide expression arrays is the use of several probes to assay each targeted transcript. This is a bonanza for the statistical geneticist, who can create probeset summaries with specific characteristics. There are now several methods available for summarizing probe level data from the popular Affymetrix GeneChips, but it is difficult to identify the best method for a given inquiry. RESULTS We have developed a graphical tool to evaluate summaries of Affymetrix probe level data. Plots and summary statistics offer a picture of how an expression measure performs in several important areas. This picture facilitates the comparison of competing expression measures and the selection of methods suitable for a specific investigation. The key is a benchmark data set consisting of a dilution study and a spike-in study. Because the truth is known for these data, we can identify statistical features of the data for which the expected outcome is known in advance. Those features highlighted in our suite of graphs are justified by questions of biological interest and motivated by the presence of appropriate data.


research in computational molecular biology | 2004

Stochastic models inspired by hybridization theory for short oligonucleotide arrays

Zhijin Wu; Rafael A. Irizarry

High density oligonucleotide expression arrays are a widely used tool for the measurement of gene expression on a large scale. Affymetrix GeneChip arrays appear to dominate this market. These arrays use short oligonucleotides to probe for genes in an RNA sample. Due to optical noise, non-specific hybridization, probe-specific effects, and measurement error, ad-hoc measures of expression, that summarize probe intensities, can lead to imprecise and inaccurate results. Various researchers have demonstrated that expression measures based on simple statistical models can provide great improvements over the ad-hoc procedure offered by Affymetrix. Recently, physical models based on molecular hybridization theory, have been proposed as useful tools for prediction of, for example, non-specific hybridization. These physical models show great potential in terms of improving existing expression measures. In this paper we suggest that the system producing the measured intensities is too complex to be fully described with these relatively simple physical models and we propose empirically motivated stochastic models that compliment the above mentioned molecular hybridization theory to provide a comprehensive description of the data. We discuss how the proposed model can be used to obtain improved measures of expression useful for the data analysts.


Biostatistics | 2012

Removing technical variability in RNA-seq data using conditional quantile normalization

Kasper D. Hansen; Rafael A. Irizarry; Zhijin Wu

The ability to measure gene expression on a genome-wide scale is one of the most promising accomplishments in molecular biology. Microarrays, the technology that first permitted this, were riddled with problems due to unwanted sources of variability. Many of these problems are now mitigated, after a decades worth of statistical methodology development. The recently developed RNA sequencing (RNA-seq) technology has generated much excitement in part due to claims of reduced variability in comparison to microarrays. However, we show that RNA-seq data demonstrate unwanted and obscuring variability similar to what was first observed in microarrays. In particular, we find guanine-cytosine content (GC-content) has a strong sample-specific effect on gene expression measurements that, if left uncorrected, leads to false positives in downstream results. We also report on commonly observed data distortions that demonstrate the need for data normalization. Here, we describe a statistical methodology that improves precision by 42% without loss of accuracy. Our resulting conditional quantile normalization algorithm combines robust generalized regression to remove systematic bias introduced by deterministic features such as GC-content and quantile normalization to correct for global distortions.


Journal of Computational Biology | 2005

Stochastic models inspired by hybridization theory for short oligonucleotide arrays.

Zhijin Wu; Rafael A. Irizarry

High density oligonucleotide expression arrays are a widely used tool for the measurement of gene expression on a large scale. Affymetrix GeneChip arrays appear to dominate this market. These arrays use short oligonucleotides to probe for genes in an RNA sample. Due to optical noise, nonspecific hybridization, probe-specific effects, and measurement error, ad hoc measures of expression that summarize probe intensities can lead to imprecise and inaccurate results. Various researchers have demonstrated that expression measures based on simple statistical models can provide great improvements over the ad hoc procedure offered by Affymetrix. Recently, physical models based on molecular hybridization theory have been proposed as useful tools for prediction of, for example, nonspecific hybridization. These physical models show great potential in terms of improving existing expression measures. In this paper, we suggest that the system producing the measured intensities is too complex to be fully described with these relatively simple physical models, and we propose empirically motivated stochastic models that complement the above-mentioned molecular hybridization theory to provide a comprehensive description of the data. We discuss how the proposed model can be used to obtain improved measures of expression useful for the data analysts.


Cancer Research | 2008

DNA Hypomethylation Arises Later in Prostate Cancer Progression than CpG Island Hypermethylation and Contributes to Metastatic Tumor Heterogeneity

Srinivasan Yegnasubramanian; Michael C. Haffner; Yonggang Zhang; Bora Gurel; Toby C. Cornish; Zhijin Wu; Rafael A. Irizarry; James Morgan; Jessica Hicks; Theodore L. DeWeese; William B. Isaacs; G. Steven Bova; Angelo M. De Marzo; William G. Nelson

Hypomethylation of CpG dinucleotides in genomic DNA was one of the first somatic epigenetic alterations discovered in human cancers. DNA hypomethylation is postulated to occur very early in almost all human cancers, perhaps facilitating genetic instability and cancer initiation and progression. We therefore examined the nature, extent, and timing of DNA hypomethylation changes in human prostate cancer. Contrary to the prevailing view that global DNA hypomethylation changes occur extremely early in all human cancers, we show that reductions in (5me)C content in the genome occur very late in prostate cancer progression, appearing at a significant extent only at the stage of metastatic disease. Furthermore, we found that, whereas some LINE1 promoter hypomethylation does occur in primary prostate cancers compared with normal tissues, this LINE1 hypomethylation is significantly more pronounced in metastatic prostate cancer. Next, we carried out a tiered gene expression microarray and bisulfite genomic sequencing-based approach to identify genes that are silenced by CpG island methylation in normal prostate cells but become overexpressed in prostate cancer cells as a result of CpG island hypomethylation. Through this analysis, we show that a class of cancer testis antigen genes undergoes CpG island hypomethylation and overexpression in primary prostate cancers, but more so in metastatic prostate cancers. Finally, we show that DNA hypomethylation patterns are quite heterogeneous across different metastatic sites within the same patients. These findings provide evidence that DNA hypomethylation changes occur later in prostate carcinogenesis than the CpG island hypermethylation changes and occur heterogeneously during prostate cancer progression and metastatic dissemination.


PLOS ONE | 2012

The Transcriptome and Proteome of the Diatom Thalassiosira pseudonana Reveal a Diverse Phosphorus Stress Response

Sonya T. Dyhrman; Bethany D. Jenkins; Tatiana A. Rynearson; Mak A. Saito; Melissa L. Mercier; Harriet Alexander; LeAnn P Whitney; Andrea Drzewianowski; Vladimir V. Bulygin; Erin M. Bertrand; Zhijin Wu; Claudia R. Benitez-Nelson; Abigail Heithoff

Phosphorus (P) is a critical driver of phytoplankton growth and ecosystem function in the ocean. Diatoms are an abundant class of marine phytoplankton that are responsible for significant amounts of primary production. With the control they exert on the oceanic carbon cycle, there have been a number of studies focused on how diatoms respond to limiting macro and micronutrients such as iron and nitrogen. However, diatom physiological responses to P deficiency are poorly understood. Here, we couple deep sequencing of transcript tags and quantitative proteomics to analyze the diatom Thalassiosira pseudonana grown under P-replete and P-deficient conditions. A total of 318 transcripts were differentially regulated with a false discovery rate of <0.05, and a total of 136 proteins were differentially abundant (p<0.05). Significant changes in the abundance of transcripts and proteins were observed and coordinated for multiple biochemical pathways, including glycolysis and translation. Patterns in transcript and protein abundance were also linked to physiological changes in cellular P distributions, and enzyme activities. These data demonstrate that diatom P deficiency results in changes in cellular P allocation through polyphosphate production, increased P transport, a switch to utilization of dissolved organic P through increased production of metalloenzymes, and a remodeling of the cell surface through production of sulfolipids. Together, these findings reveal that T. pseudonana has evolved a sophisticated response to P deficiency involving multiple biochemical strategies that are likely critical to its ability to respond to variations in environmental P availability.


Biostatistics | 2013

A new shrinkage estimator for dispersion improves differential expression detection in RNA-seq data

Hao Wu; Chi Wang; Zhijin Wu

Recent developments in RNA-sequencing (RNA-seq) technology have led to a rapid increase in gene expression data in the form of counts. RNA-seq can be used for a variety of applications, however, identifying differential expression (DE) remains a key task in functional genomics. There have been a number of statistical methods for DE detection for RNA-seq data. One common feature of several leading methods is the use of the negative binomial (Gamma–Poisson mixture) model. That is, the unobserved gene expression is modeled by a gamma random variable and, given the expression, the sequencing read counts are modeled as Poisson. The distinct feature in various methods is how the variance, or dispersion, in the Gamma distribution is modeled and estimated. We evaluate several large public RNA-seq datasets and find that the estimated dispersion in existing methods does not adequately capture the heterogeneity of biological variance among samples. We present a new empirical Bayes shrinkage estimate of the dispersion parameters and demonstrate improved DE detection.

Collaboration


Dive into the Zhijin Wu's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Zhiqiang Wu

Wright State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Leslie Cope

Johns Hopkins University

View shared research outputs
Top Co-Authors

Avatar

Terence P. Speed

Walter and Eliza Hall Institute of Medical Research

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge