Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jinfeng Zou is active.

Publication


Featured researches published by Jinfeng Zou.


Bioinformatics | 2008

Apparently low reproducibility of true differential expression discoveries in microarray studies

Min Zhang; Chen Yao; Zheng Guo; Jinfeng Zou; Lin Zhang; Hui Xiao; D. Wang; Da Yang; Xue Gong; Jing Zhu; Yanhui Li; Xia Li

MOTIVATION Differentially expressed gene (DEG) lists detected from different microarray studies for a same disease are often highly inconsistent. Even in technical replicate tests using identical samples, DEG detection still shows very low reproducibility. It is often believed that current small microarray studies will largely introduce false discoveries. RESULTS Based on a statistical model, we show that even in technical replicate tests using identical samples, it is highly likely that the selected DEG lists will be very inconsistent in the presence of small measurement variations. Therefore, the apparently low reproducibility of DEG detection from current technical replicate tests does not indicate low quality of microarray technology. We also demonstrate that heterogeneous biological variations existing in real cancer data will further reduce the overall reproducibility of DEG detection. Nevertheless, in small subsamples from both simulated and real data, the actual false discovery rate (FDR) for each DEG list tends to be low, suggesting that each separately determined list may comprise mostly true DEGs. Rather than simply counting the overlaps of the discovery lists from different studies for a complex disease, novel metrics are needed for evaluating the reproducibility of discoveries characterized with correlated molecular changes. Supplementaty information: Supplementary data are available at Bioinformatics online.


Bioinformatics | 2009

Evaluating reproducibility of differential expression discoveries in microarray studies by considering correlated molecular changes

Min Zhang; Lin Zhang; Jinfeng Zou; Chen Yao; Hui Xiao; Qing Liu; Jing Wang; D. Wang; Chenguang Wang; Zheng Guo

Motivation: According to current consistency metrics such as percentage of overlapping genes (POG), lists of differentially expressed genes (DEGs) detected from different microarray studies for a complex disease are often highly inconsistent. This irreproducibility problem also exists in other high-throughput post-genomic areas such as proteomics and metabolism. A complex disease is often characterized with many coordinated molecular changes, which should be considered when evaluating the reproducibility of discovery lists from different studies. Results: We proposed metrics percentage of overlapping genes-related (POGR) and normalized POGR (nPOGR) to evaluate the consistency between two DEG lists for a complex disease, considering correlated molecular changes rather than only counting gene overlaps between the lists. Based on microarray datasets of three diseases, we showed that though the POG scores for DEG lists from different studies for each disease are extremely low, the POGR and nPOGR scores can be rather high, suggesting that the apparently inconsistent DEG lists may be highly reproducible in the sense that they are actually significantly correlated. Observing different discovery results for a disease by the POGR and nPOGR scores will obviously reduce the uncertainty of the microarray studies. The proposed metrics could also be applicable in many other high-throughput post-genomic areas. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


Briefings in Bioinformatics | 2012

GO-function: deriving biologically relevant functions from statistically significant functions

Jing Wang; Xianxiao Zhou; Jing Zhu; Yunyan Gu; Wenyuan Zhao; Jinfeng Zou; Zheng Guo

In high-throughput studies of diseases, terms enriched with disease-related genes based on Gene Ontology (GO) are routinely found. However, most current algorithms used to find significant GO terms cannot handle the redundancy that results from the dependencies of GO terms. Simply based on some numerical considerations, current algorithms developed for reducing this redundancy may produce results that do not account for biologically interesting cases. In this article, we present several rules used to design a tool called GO-function for extracting biologically relevant terms from statistically significant GO terms for a disease. Using one gene expression profile for colorectal cancer, we compared GO-function with four algorithms designed to treat redundancy. Then, we validated results obtained in this data set by GO-function using another data set for colorectal cancer. Our analysis showed that GO-function can identify disease-related terms that are more statistically and biologically meaningful than those found by the other four algorithms.


Gene | 2013

Extracting a few functionally reproducible biomarkers to build robust subnetwork-based classifiers for the diagnosis of cancer

Lin Zhang; Shan Li; Chunxiang Hao; Guini Hong; Jinfeng Zou; Yuannv Zhang; Pengfei Li; Zheng Guo

In microarray-based case-control studies of a disease, people often attempt to identify a few diagnostic or prognostic markers amongst the most significant differentially expressed (DE) genes. However, the reproducibility of DE genes identified in different studies for a disease is typically very low. To tackle the problem, we could evaluate the reproducibility of DE genes across studies and define robust markers for disease diagnosis using disease-associated protein-protein interaction (PPI) subnetwork. Using datasets for four cancer types, we found that the most significant DE genes in cancer exhibit consistent up- or down-regulation in different datasets. For each cancer type, the 5 (or 10) most significant DE genes separately extracted from different datasets tend to be significantly coexpressed and closely connected in the PPI subnetwork, thereby indicating that they are highly reproducible at the PPI level. Consequently, we were able to build robust subnetwork-based classifiers for cancer diagnosis.


Molecular Cancer Therapeutics | 2010

Systematic Interpretation of Comutated Genes in Large-Scale Cancer Mutation Profiles

Yunyan Gu; Da Yang; Jinfeng Zou; Wencai Ma; Ruihong Wu; Wenyuan Zhao; Yuannv Zhang; Hui Xiao; Xue Gong; Min Zhang; Jing Zhu; Zheng Guo

By high-throughput screens of somatic mutations of genes in cancer genomes, hundreds of cancer genes are being rapidly identified, providing us abundant information for systematically deciphering the genetic changes underlying cancer mechanism. However, the functional collaboration of mutated genes is often neglected in current studies. Here, using four genome-wide somatic mutation data sets and pathways defined in various databases, we showed that gene pairs significantly comutated in cancer samples tend to distribute between pathways rather than within pathways. At the basic functional level of motifs in the human protein-protein interaction network, we also found that comutated gene pairs were overrepresented between motifs but extremely depleted within motifs. Specifically, we showed that based on Gene Ontology that describes gene functions at various specific levels, we could tackle the pathway definition problem to some degree and study the functional collaboration of gene mutations in cancer genomes more efficiently. Then, by defining pairs of pathways frequently linked by comutated gene pairs as the between-pathway models, we showed they are also likely to be codisrupted by mutations of the interpathway hubs of the coupled pathways, suggesting new hints for understanding the heterogeneous mechanisms of cancers. Finally, we showed some between-pathway models consisting of important pathways such as cell cycle checkpoint and cell proliferation were codisrupted in most cancer samples under this study, suggesting that their codisruptions might be functionally essential in inducing these cancers. All together, our results would provide a channel to detangle the complex collaboration of the molecular processes underlying cancer mechanism. Mol Cancer Ther; 9(8); 2186–95. ©2010 AACR.


PLOS ONE | 2011

Reproducible Cancer Biomarker Discovery in SELDI-TOF MS Using Different Pre-Processing Algorithms

Jinfeng Zou; Guini Hong; Xinwu Guo; Lin Zhang; Chen Yao; Jing Wang; Zheng Guo

Background There has been much interest in differentiating diseased and normal samples using biomarkers derived from mass spectrometry (MS) studies. However, biomarker identification for specific diseases has been hindered by irreproducibility. Specifically, a peak profile extracted from a dataset for biomarker identification depends on a data pre-processing algorithm. Until now, no widely accepted agreement has been reached. Results In this paper, we investigated the consistency of biomarker identification using differentially expressed (DE) peaks from peak profiles produced by three widely used average spectrum-dependent pre-processing algorithms based on SELDI-TOF MS data for prostate and breast cancers. Our results revealed two important factors that affect the consistency of DE peak identification using different algorithms. One factor is that some DE peaks selected from one peak profile were not detected as peaks in other profiles, and the second factor is that the statistical power of identifying DE peaks in large peak profiles with many peaks may be low due to the large scale of the tests and small number of samples. Furthermore, we demonstrated that the DE peak detection power in large profiles could be improved by the stratified false discovery rate (FDR) control approach and that the reproducibility of DE peak detection could thereby be increased. Conclusions Comparing and evaluating pre-processing algorithms in terms of reproducibility can elucidate the relationship among different algorithms and also help in selecting a pre-processing algorithm. The DE peaks selected from small peak profiles with few peaks for a dataset tend to be reproducibly detected in large peak profiles, which suggests that a suitable pre-processing algorithm should be able to produce peaks sufficient for identifying useful and reproducible biomarkers.


Bioinformatics | 2010

Viewing cancer genes from co-evolving gene modules

Jing Zhu; Hui Xiao; Xiaopei Shen; Jing Wang; Jinfeng Zou; Lin Zhang; Da Yang; Wencai Ma; Chen Yao; Xue Gong; Min Zhang; Yang Zhang; Zheng Guo

MOTIVATION Studying the evolutionary conservation of cancer genes can improve our understanding of the genetic basis of human cancers. Functionally related proteins encoded by genes tend to interact with each other in a modular fashion, which may affect both the mode and tempo of their evolution. RESULTS In the human PPI network, we searched for subnetworks within each of which all proteins have evolved at similar rates since the human and mouse split. Identified at a given co-evolving level, the subnetworks with non-randomly large sizes were defined as co-evolving modules. We showed that proteins within modules tend to be conserved, evolutionarily old and enriched with housekeeping genes, while proteins outside modules tend to be less-conserved, evolutionarily younger and enriched with genes expressed in specific tissues. Viewing cancer genes from co-evolving modules showed that the overall conservation of cancer genes should be mainly attributed to the cancer proteins enriched in the conserved modules. Functional analysis further suggested that cancer proteins within and outside modules might play different roles in carcinogenesis, providing a new hint for studying the mechanism of cancer.


Science China-life Sciences | 2011

Functional modules with disease discrimination abilities for various cancers

Chen Yao; Min Zhang; Jinfeng Zou; Hongdong Li; D. Wang; Jing Zhu; Zheng Guo

Selecting differentially expressed genes (DEGs) is one of the most important tasks in microarray applications for studying multi-factor diseases including cancers. However, the small samples typically used in current microarray studies may only partially reflect the widely altered gene expressions in complex diseases, which would introduce low reproducibility of gene lists selected by statistical methods. Here, by analyzing seven cancer datasets, we showed that, in each cancer, a wide range of functional modules have altered gene expressions and thus have high disease classification abilities. The results also showed that seven modules are shared across diverse cancers, suggesting hints about the common mechanisms of cancers. Therefore, instead of relying on a few individual genes whose selection is hardly reproducible in current microarray experiments, we may use functional modules as functional signatures to study core mechanisms of cancers and build robust diagnostic classifiers.


robotics and applications | 2012

Evaluating FDR and stratified FDR control approaches for high-throughput biological studies

Jinfeng Zou; Guini Hong; Junjie Zheng; Chunxiang Hao; Jing Wang; Zheng Guo

False discovery rate (FDR) control procedures are commonly used for the correction of multiple testing in high-throughput biological studies. Although the expectation of FDR estimations can be controlled, the variance of the FDR estimations has not been fully analysed. Especially, the effect of the variance of the FDR estimator on the stratified FDR control approach, which is proposed to improve the statistical powers of FDR control procedures, is unclear. In this study, we analyzed the effects of three major factors (the percentage of true null hypotheses, the number of hypotheses and the effect size of true alternative hypotheses) on the performances of the FDR and stratified FDR control approaches. We show that the variance of the FDR estimations tends to be small when at least one of the following conditions is satisfied: (1) the percentage of true null hypotheses is not too large, (2) the number of tests is relatively large, or (3) the effect size of true alternative hypotheses is not too small. We demonstrated that when all the hypotheses are stratified into two groups, the variance of the stratified FDR estimations tends to be small if each group satisfies at least one of the above mentioned conditions. In such a situation, the actual stratified FDR for an experiment tends to be under the given control level.


biomedical engineering and informatics | 2008

Disease Prediction Power and Stability of Differential Expressed Genes

Chen Yao; Min Zhang; Jinfeng Zou; Xue Gong; Lin Zhang; Chenguang Wang; Zheng Guo

Selecting feature genes for disease prediction is one of the most important applications of microarray technology. However, gene lists obtained in different studies for a same clinical type of patients often differ widely and have few genes in common. Recent researches suggest that gene lists ranked by fold change are more reproducible than by t-test. Here, based on the resampling method, we use training sets of different sizes to select features as top-ranked by P- value of t-test, d-value of SAM, and fold change. Then, we evaluate the stability and the disease classification power of each top ranked gene list. Our result suggests that for disease classification, gene lists selected through d-value ranking are most suitable concerning both reproducibility and classification power.

Collaboration


Dive into the Jinfeng Zou's collaboration.

Top Co-Authors

Avatar

Zheng Guo

Fujian Medical University

View shared research outputs
Top Co-Authors

Avatar

Chen Yao

University of Electronic Science and Technology of China

View shared research outputs
Top Co-Authors

Avatar

Lin Zhang

University of Electronic Science and Technology of China

View shared research outputs
Top Co-Authors

Avatar

Jing Wang

University of Texas MD Anderson Cancer Center

View shared research outputs
Top Co-Authors

Avatar

Jing Zhu

University of Electronic Science and Technology of China

View shared research outputs
Top Co-Authors

Avatar

Min Zhang

University of Pittsburgh

View shared research outputs
Top Co-Authors

Avatar

Chunxiang Hao

University of Electronic Science and Technology of China

View shared research outputs
Top Co-Authors

Avatar

D. Wang

Harbin Medical University

View shared research outputs
Top Co-Authors

Avatar

Guini Hong

University of Electronic Science and Technology of China

View shared research outputs
Top Co-Authors

Avatar

Hui Xiao

University of Electronic Science and Technology of China

View shared research outputs
Researchain Logo
Decentralizing Knowledge