Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Lin Zhang is active.

Publication


Featured researches published by Lin Zhang.


Bioinformatics | 2008

Apparently low reproducibility of true differential expression discoveries in microarray studies

Min Zhang; Chen Yao; Zheng Guo; Jinfeng Zou; Lin Zhang; Hui Xiao; D. Wang; Da Yang; Xue Gong; Jing Zhu; Yanhui Li; Xia Li

MOTIVATIONnDifferentially expressed gene (DEG) lists detected from different microarray studies for a same disease are often highly inconsistent. Even in technical replicate tests using identical samples, DEG detection still shows very low reproducibility. It is often believed that current small microarray studies will largely introduce false discoveries.nnnRESULTSnBased on a statistical model, we show that even in technical replicate tests using identical samples, it is highly likely that the selected DEG lists will be very inconsistent in the presence of small measurement variations. Therefore, the apparently low reproducibility of DEG detection from current technical replicate tests does not indicate low quality of microarray technology. We also demonstrate that heterogeneous biological variations existing in real cancer data will further reduce the overall reproducibility of DEG detection. Nevertheless, in small subsamples from both simulated and real data, the actual false discovery rate (FDR) for each DEG list tends to be low, suggesting that each separately determined list may comprise mostly true DEGs. Rather than simply counting the overlaps of the discovery lists from different studies for a complex disease, novel metrics are needed for evaluating the reproducibility of discoveries characterized with correlated molecular changes. Supplementaty information: Supplementary data are available at Bioinformatics online.


Bioinformatics | 2009

Evaluating reproducibility of differential expression discoveries in microarray studies by considering correlated molecular changes

Min Zhang; Lin Zhang; Jinfeng Zou; Chen Yao; Hui Xiao; Qing Liu; Jing Wang; D. Wang; Chenguang Wang; Zheng Guo

Motivation: According to current consistency metrics such as percentage of overlapping genes (POG), lists of differentially expressed genes (DEGs) detected from different microarray studies for a complex disease are often highly inconsistent. This irreproducibility problem also exists in other high-throughput post-genomic areas such as proteomics and metabolism. A complex disease is often characterized with many coordinated molecular changes, which should be considered when evaluating the reproducibility of discovery lists from different studies. Results: We proposed metrics percentage of overlapping genes-related (POGR) and normalized POGR (nPOGR) to evaluate the consistency between two DEG lists for a complex disease, considering correlated molecular changes rather than only counting gene overlaps between the lists. Based on microarray datasets of three diseases, we showed that though the POG scores for DEG lists from different studies for each disease are extremely low, the POGR and nPOGR scores can be rather high, suggesting that the apparently inconsistent DEG lists may be highly reproducible in the sense that they are actually significantly correlated. Observing different discovery results for a disease by the POGR and nPOGR scores will obviously reduce the uncertainty of the microarray studies. The proposed metrics could also be applicable in many other high-throughput post-genomic areas. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


BMC Bioinformatics | 2010

Extracting consistent knowledge from highly inconsistent cancer gene data sources

Xue Gong; Ruihong Wu; Yuannv Zhang; Wenyuan Zhao; Lixin Cheng; Yunyan Gu; Lin Zhang; Jing Wang; Jing Zhu; Zheng Guo

BackgroundHundreds of genes that are causally implicated in oncogenesis have been found and collected in various databases. For efficient application of these abundant but diverse data sources, it is of fundamental importance to evaluate their consistency.ResultsFirst, we showed that the lists of cancer genes from some major data sources were highly inconsistent in terms of overlapping genes. In particular, most cancer genes accumulated in previous small-scale studies could not be rediscovered in current high-throughput genome screening studies. Then, based on a metric proposed in this study, we showed that most cancer gene lists from different data sources were highly functionally consistent. Finally, we extracted functionally consistent cancer genes from various data sources and collected them in our database F-Census.ConclusionsAlthough they have very low gene overlapping, most cancer gene data sources are highly consistent at the functional level, which indicates that they can separately capture partial genes in a few key pathways associated with cancer. Our results suggest that the sample sizes currently used for cancer studies might be inadequate for consistently capturing individual cancer genes, but could be sufficient for finding a number of cancer genes that could represent functionally most cancer genes. The F-Census database provides biologists with a useful tool for browsing and extracting functionally consistent cancer genes from various data sources.


BMC Systems Biology | 2010

Multi-level reproducibility of signature hubs in human interactome for breast cancer metastasis

Chen Yao; Hongdong Li; Chenggui Zhou; Lin Zhang; Jinfeng Zou; Zheng Guo

BackgroundIt has been suggested that, in the human protein-protein interaction network, changes of co-expression between highly connected proteins (hub) and their interaction neighbours might have important roles in cancer metastasis and be predictive disease signatures for patient outcome. However, for a cancer, such disease signatures identified from different studies have little overlap.ResultsHere, we propose a systemic approach to evaluate the reproducibility of disease signatures at multiple levels, on the basis of some statistically testable biological models. Using two datasets for breast cancer metastasis, we showed that different signature hubs identified from different studies were highly consistent in terms of significantly sharing interaction neighbours and displaying consistent co-expression changes with their overlapping neighbours, whereas the shared interaction neighbours were significantly over-represented with known cancer genes and enriched in pathways deregulated in breast cancer pathogenesis. Then, we showed that the signature hubs identified from the two datasets were highly reproducible at the protein interaction and pathway levels in three other independent datasets.ConclusionsOur results provide a possible biological model that different signature hubs altered in different patient cohorts could disturb the same pathways associated with cancer metastasis through their interaction neighbours.


Nucleic Acids Research | 2017

RNALocate: a resource for RNA subcellular localizations

Ting Zhang; Puwen Tan; Liqiang Wang; Nana Jin; Yana Li; Lin Zhang; Huan Yang; Zhenyu Hu; Lining Zhang; Chunyu Hu; Chunhua Li; Kun Qian; Chang-Jian Zhang; Yan Huang; Kongning Li; Hao Lin; D. Wang

Increasing evidence has revealed that RNA subcellular localization is a very important feature for deeply understanding RNAs biological functions after being transported into intra- or extra-cellular regions. RNALocate is a web-accessible database that aims to provide a high-quality RNA subcellular localization resource and facilitate future researches on RNA function or structure. The current version of RNALocate documents more than 37 700 manually curated RNA subcellular localization entries with experimental evidence, involving more than 21 800 RNAs with 42 subcellular localizations in 65 species, mainly including Homo sapiens, Mus musculus and Saccharomyces cerevisiae etc. Besides, RNA homology, sequence and interaction data have also been integrated into RNALocate. Users can access these data through online search, browse, blast and visualization tools. In conclusion, RNALocate will be of help in elucidating the entirety of RNA subcellular localization, and developing new prediction methods. The database is available at http://www.rna-society.org/rnalocate/.


Gene | 2013

Extracting a few functionally reproducible biomarkers to build robust subnetwork-based classifiers for the diagnosis of cancer

Lin Zhang; Shan Li; Chunxiang Hao; Guini Hong; Jinfeng Zou; Yuannv Zhang; Pengfei Li; Zheng Guo

In microarray-based case-control studies of a disease, people often attempt to identify a few diagnostic or prognostic markers amongst the most significant differentially expressed (DE) genes. However, the reproducibility of DE genes identified in different studies for a disease is typically very low. To tackle the problem, we could evaluate the reproducibility of DE genes across studies and define robust markers for disease diagnosis using disease-associated protein-protein interaction (PPI) subnetwork. Using datasets for four cancer types, we found that the most significant DE genes in cancer exhibit consistent up- or down-regulation in different datasets. For each cancer type, the 5 (or 10) most significant DE genes separately extracted from different datasets tend to be significantly coexpressed and closely connected in the PPI subnetwork, thereby indicating that they are highly reproducible at the PPI level. Consequently, we were able to build robust subnetwork-based classifiers for cancer diagnosis.


Nucleic Acids Research | 2017

RAID v2.0: an updated resource of RNA-associated interactions across organisms

Ying Yi; Yue Zhao; Chunhua Li; Lin Zhang; Huiying Huang; Yana Li; Lanlan Liu; Ping Hou; Tianyu Cui; Puwen Tan; Yongfei Hu; Ting Zhang; Yan Huang; Xiaobo Li; Jia Yu; D. Wang

With the development of biotechnologies and computational prediction algorithms, the number of experimental and computational prediction RNA-associated interactions has grown rapidly in recent years. However, diverse RNA-associated interactions are scattered over a wide variety of resources and organisms, whereas a fully comprehensive view of diverse RNA-associated interactions is still not available for any species. Hence, we have updated the RAID database to version 2.0 (RAID v2.0, www.rna-society.org/raid/) by integrating experimental and computational prediction interactions from manually reading literature and other database resources under one common framework. The new developments in RAID v2.0 include (i) over 850-fold RNA-associated interactions, an enhancement compared to the previous version; (ii) numerous resources integrated with experimental or computational prediction evidence for each RNA-associated interaction; (iii) a reliability assessment for each RNA-associated interaction based on an integrative confidence score; and (iv) an increase of species coverage to 60. Consequently, RAID v2.0 recruits more than 5.27 million RNA-associated interactions, including more than 4 million RNA–RNA interactions and more than 1.2 million RNA–protein interactions, referring to nearly 130 000 RNA/protein symbols across 60 species.


PLOS ONE | 2011

Reproducible Cancer Biomarker Discovery in SELDI-TOF MS Using Different Pre-Processing Algorithms

Jinfeng Zou; Guini Hong; Xinwu Guo; Lin Zhang; Chen Yao; Jing Wang; Zheng Guo

Background There has been much interest in differentiating diseased and normal samples using biomarkers derived from mass spectrometry (MS) studies. However, biomarker identification for specific diseases has been hindered by irreproducibility. Specifically, a peak profile extracted from a dataset for biomarker identification depends on a data pre-processing algorithm. Until now, no widely accepted agreement has been reached. Results In this paper, we investigated the consistency of biomarker identification using differentially expressed (DE) peaks from peak profiles produced by three widely used average spectrum-dependent pre-processing algorithms based on SELDI-TOF MS data for prostate and breast cancers. Our results revealed two important factors that affect the consistency of DE peak identification using different algorithms. One factor is that some DE peaks selected from one peak profile were not detected as peaks in other profiles, and the second factor is that the statistical power of identifying DE peaks in large peak profiles with many peaks may be low due to the large scale of the tests and small number of samples. Furthermore, we demonstrated that the DE peak detection power in large profiles could be improved by the stratified false discovery rate (FDR) control approach and that the reproducibility of DE peak detection could thereby be increased. Conclusions Comparing and evaluating pre-processing algorithms in terms of reproducibility can elucidate the relationship among different algorithms and also help in selecting a pre-processing algorithm. The DE peaks selected from small peak profiles with few peaks for a dataset tend to be reproducibly detected in large peak profiles, which suggests that a suitable pre-processing algorithm should be able to produce peaks sufficient for identifying useful and reproducible biomarkers.


PLOS ONE | 2013

An Integrated Approach to Uncover Driver Genes in Breast Cancer Methylation Genomes

Xiaopei Shen; Shan Li; Lin Zhang; Hongdong Li; Guini Hong; Xianxiao Zhou; Tingting Zheng; Wenjing Zhang; Chunxiang Hao; Tongwei Shi; Chunyang Liu; Zheng Guo

Background Cancer cells typically exhibit large-scale aberrant methylation of gene promoters. Some of the genes with promoter methylation alterations play “driver” roles in tumorigenesis, whereas others are only “passengers”. Results Based on the assumption that promoter methylation alteration of a driver gene may lead to expression alternation of a set of genes associated with cancer pathways, we developed a computational framework for integrating promoter methylation and gene expression data to identify driver methylation aberrations of cancer. Applying this approach to breast cancer data, we identified many novel cancer driver genes and found that some of the identified driver genes were subtype-specific for basal-like, luminal-A and HER2+ subtypes of breast cancer. Conclusion The proposed framework proved effective in identifying cancer driver genes from genome-wide gene methylation and expression data of cancer. These results may provide new molecular targets for potential targeted and selective epigenetic therapy.


Nucleic Acids Research | 2017

MNDR v2.0: an updated resource of ncRNA–disease associations in mammals

Tianyu Cui; Lin Zhang; Yan Huang; Ying Yi; Puwen Tan; Yue Zhao; Yongfei Hu; Liyan Xu; Enmin Li; D. Wang

Abstract Accumulating evidence suggests that diverse non-coding RNAs (ncRNAs) are involved in the progression of a wide variety of diseases. In recent years, abundant ncRNA–disease associations have been found and predicted according to experiments and prediction algorithms. Diverse ncRNA–disease associations are scattered over many resources and mammals, whereas a global view of diverse ncRNA–disease associations is not available for any mammals. Hence, we have updated the MNDR v2.0 database (www.rna-society.org/mndr/) by integrating experimental and prediction associations from manual literature curation and other resources under one common framework. The new developments in MNDR v2.0 include (i) an over 220-fold increase in ncRNA–disease associations enhancement compared with the previous version (including lncRNA, miRNA, piRNA, snoRNA and more than 1400 diseases); (ii) integrating experimental and prediction evidence from 14 resources and prediction algorithms for each ncRNA–disease association; (iii) mapping disease names to the Disease Ontology and Medical Subject Headings (MeSH); (iv) providing a confidence score for each ncRNA–disease association and (v) an increase of species coverage to six mammals. Finally, MNDR v2.0 intends to provide the scientific community with a resource for efficient browsing and extraction of the associations between diverse ncRNAs and diseases, including >260 000 ncRNA–disease associations.

Collaboration


Dive into the Lin Zhang's collaboration.

Top Co-Authors

Avatar

Zheng Guo

Fujian Medical University

View shared research outputs
Top Co-Authors

Avatar

Jinfeng Zou

University of Electronic Science and Technology of China

View shared research outputs
Top Co-Authors

Avatar

Chen Yao

University of Electronic Science and Technology of China

View shared research outputs
Top Co-Authors

Avatar

D. Wang

Harbin Medical University

View shared research outputs
Top Co-Authors

Avatar

Jing Wang

University of Texas MD Anderson Cancer Center

View shared research outputs
Top Co-Authors

Avatar

Chunxiang Hao

University of Electronic Science and Technology of China

View shared research outputs
Top Co-Authors

Avatar

Guini Hong

University of Electronic Science and Technology of China

View shared research outputs
Top Co-Authors

Avatar

Jing Zhu

University of Electronic Science and Technology of China

View shared research outputs
Top Co-Authors

Avatar

Xiaopei Shen

University of Electronic Science and Technology of China

View shared research outputs
Top Co-Authors

Avatar

Xue Gong

University of Electronic Science and Technology of China

View shared research outputs
Researchain Logo
Decentralizing Knowledge