Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yuannv Zhang is active.

Publication


Featured researches published by Yuannv Zhang.


BMC Bioinformatics | 2010

Extracting consistent knowledge from highly inconsistent cancer gene data sources

Xue Gong; Ruihong Wu; Yuannv Zhang; Wenyuan Zhao; Lixin Cheng; Yunyan Gu; Lin Zhang; Jing Wang; Jing Zhu; Zheng Guo

BackgroundHundreds of genes that are causally implicated in oncogenesis have been found and collected in various databases. For efficient application of these abundant but diverse data sources, it is of fundamental importance to evaluate their consistency.ResultsFirst, we showed that the lists of cancer genes from some major data sources were highly inconsistent in terms of overlapping genes. In particular, most cancer genes accumulated in previous small-scale studies could not be rediscovered in current high-throughput genome screening studies. Then, based on a metric proposed in this study, we showed that most cancer gene lists from different data sources were highly functionally consistent. Finally, we extracted functionally consistent cancer genes from various data sources and collected them in our database F-Census.ConclusionsAlthough they have very low gene overlapping, most cancer gene data sources are highly consistent at the functional level, which indicates that they can separately capture partial genes in a few key pathways associated with cancer. Our results suggest that the sample sizes currently used for cancer studies might be inadequate for consistently capturing individual cancer genes, but could be sufficient for finding a number of cancer genes that could represent functionally most cancer genes. The F-Census database provides biologists with a useful tool for browsing and extracting functionally consistent cancer genes from various data sources.


Gene | 2013

Extracting a few functionally reproducible biomarkers to build robust subnetwork-based classifiers for the diagnosis of cancer

Lin Zhang; Shan Li; Chunxiang Hao; Guini Hong; Jinfeng Zou; Yuannv Zhang; Pengfei Li; Zheng Guo

In microarray-based case-control studies of a disease, people often attempt to identify a few diagnostic or prognostic markers amongst the most significant differentially expressed (DE) genes. However, the reproducibility of DE genes identified in different studies for a disease is typically very low. To tackle the problem, we could evaluate the reproducibility of DE genes across studies and define robust markers for disease diagnosis using disease-associated protein-protein interaction (PPI) subnetwork. Using datasets for four cancer types, we found that the most significant DE genes in cancer exhibit consistent up- or down-regulation in different datasets. For each cancer type, the 5 (or 10) most significant DE genes separately extracted from different datasets tend to be significantly coexpressed and closely connected in the PPI subnetwork, thereby indicating that they are highly reproducible at the PPI level. Consequently, we were able to build robust subnetwork-based classifiers for cancer diagnosis.


Computational Biology and Chemistry | 2011

Extensive increase of microarray signals in cancers calls for novel normalization assumptions

D. Wang; Lixin Cheng; Mingyue Wang; Ruihong Wu; Pengfei Li; Bin Li; Yuannv Zhang; Yunyan Gu; Wenyuan Zhao; Chenguang Wang; Zheng Guo

When using microarray data for studying a complex disease such as cancer, it is a common practice to normalize data to force all arrays to have the same distribution of probe intensities regardless of the biological groups of samples. The assumption underlying such normalization is that in a disease the majority of genes are not differentially expressed genes (DE genes) and the numbers of up- and down-regulated genes are roughly equal. However, accumulated evidences suggest gene expressions could be widely altered in cancer, so we need to evaluate the sensitivities of biological discoveries to violation of the normalization assumption. Here, we analyzed 7 large Affymetrix datasets of pair-matched normal and cancer samples for cancers collected in the NCBI GEO database. We showed that in 6 of these 7 datasets, the medians of perfect match (PM) probe intensities increased in cancer state and the increases were significant in three datasets, suggesting the assumption that all arrays have the same median probe intensities regardless of the biological groups of samples might be misleading. Then, we evaluated the effects of three currently most widely used normalization algorithms (RMA, MAS5.0 and dChip) on the selection of DE genes by comparing them with LVS which relies less on the above-mentioned assumption. The results showed using RMA, MAS5.0 and dChip may produce lots of false results of down-regulated DE genes while missing many up-regulated DE genes. At least for cancer study, normalizing all arrays to have the same distribution of probe intensities regardless of the biological groups of samples might be misleading. Thus, most current normalizations based on unreliable assumptions may distort biological differences between normal and cancer samples. The LVS algorithm might perform relatively well due to that it relies less on the above-mentioned assumption. Also, our results indicate that genes may be widely up-regulated in most human cancer.


Molecular Cancer Therapeutics | 2010

Systematic Interpretation of Comutated Genes in Large-Scale Cancer Mutation Profiles

Yunyan Gu; Da Yang; Jinfeng Zou; Wencai Ma; Ruihong Wu; Wenyuan Zhao; Yuannv Zhang; Hui Xiao; Xue Gong; Min Zhang; Jing Zhu; Zheng Guo

By high-throughput screens of somatic mutations of genes in cancer genomes, hundreds of cancer genes are being rapidly identified, providing us abundant information for systematically deciphering the genetic changes underlying cancer mechanism. However, the functional collaboration of mutated genes is often neglected in current studies. Here, using four genome-wide somatic mutation data sets and pathways defined in various databases, we showed that gene pairs significantly comutated in cancer samples tend to distribute between pathways rather than within pathways. At the basic functional level of motifs in the human protein-protein interaction network, we also found that comutated gene pairs were overrepresented between motifs but extremely depleted within motifs. Specifically, we showed that based on Gene Ontology that describes gene functions at various specific levels, we could tackle the pathway definition problem to some degree and study the functional collaboration of gene mutations in cancer genomes more efficiently. Then, by defining pairs of pathways frequently linked by comutated gene pairs as the between-pathway models, we showed they are also likely to be codisrupted by mutations of the interpathway hubs of the coupled pathways, suggesting new hints for understanding the heterogeneous mechanisms of cancers. Finally, we showed some between-pathway models consisting of important pathways such as cell cycle checkpoint and cell proliferation were codisrupted in most cancer samples under this study, suggesting that their codisruptions might be functionally essential in inducing these cancers. All together, our results would provide a channel to detangle the complex collaboration of the molecular processes underlying cancer mechanism. Mol Cancer Ther; 9(8); 2186–95. ©2010 AACR.


Gene | 2012

Comparison of different normalization assumptions for analyses of DNA methylation data from the cancer genome

D. Wang; Yuannv Zhang; Yan Huang; Pengfei Li; Mingyue Wang; Ruihong Wu; Lixin Cheng; Wenjing Zhang; Yujing Zhang; Bin Li; Chenguang Wang; Zheng Guo

Nowadays, some researchers normalized DNA methylation arrays data in order to remove the technical artifacts introduced by experimental differences in sample preparation, array processing and other factors. However, other researchers analyzed DNA methylation arrays without performing data normalization considering that current normalizations for methylation data may distort real differences between normal and cancer samples because cancer genomes may be extensively subject to hypomethylation and the total amount of CpG methylation might differ substantially among samples. In this study, using eight datasets by Infinium HumanMethylation27 assay, we systemically analyzed the global distribution of DNA methylation changes in cancer compared to normal control and its effect on data normalization for selecting differentially methylated (DM) genes. We showed more differentially methylated (DM) genes could be found in the Quantile/Lowess-normalized data than in the non-normalized data. We found the DM genes additionally selected in the Quantile/Lowess-normalized data showed significantly consistent methylation states in another independent dataset for the same cancer, indicating these extra DM genes were effective biological signals related to the disease. These results suggested normalization can increase the power of detecting DM genes in the context of diagnostic markers which were usually characterized by relatively large effect sizes. Besides, we evaluated the reproducibility of DM discoveries for a particular cancer type, and we found most of the DM genes additionally detected in one dataset showed the same methylation directions in the other dataset for the same cancer type, indicating that these DM genes were effective biological signals in the other dataset. Furthermore, we showed that some DM genes detected from different studies for a particular cancer type were significantly reproducible at the functional level.


Molecular Cancer Therapeutics | 2011

Evaluating the consistency of differential expression of microRNA detected in human cancers

Xue Gong; Ruihong Wu; Hongwei Wang; Xinwu Guo; D. Wang; Yunyan Gu; Yuannv Zhang; Wenyuan Zhao; Lixin Cheng; Chenguang Wang; Zheng Guo

Differential expression of microRNA (miRNA) is involved in many human diseases and could potentially be used as a biomarker for disease diagnosis, prognosis, and therapy. However, inconsistency has often been found among differentially expressed miRNAs identified in various studies when using miRNA arrays for a particular disease such as a cancer. Before broadly applying miRNA arrays in a clinical setting, it is critical to evaluate inconsistent discoveries in a rational way. Thus, using data sets from 2 types of cancers, our study shows that the differentially expressed miRNAs detected from multiple experiments for each cancer exhibit stable regulation direction. This result also indicates that miRNA arrays could be used to reliably capture the signals of the regulation direction of differentially expressed miRNAs in cancer. We then assumed that 2 differentially expressed miRNAs with the same regulation direction in a particular cancer play similar functional roles if they regulate the same set of cancer-associated genes. On the basis of this hypothesis, we proposed a score to assess the functional consistency between differentially expressed miRNAs separately extracted from multiple studies for a particular cancer. We showed although lists of differentially expressed miRNAs identified from different studies for each cancer were highly variable, they were rather consistent at the level of function. Thus, the detection of differentially expressed miRNAs in various experiments for a certain disease tends to be functionally reproducible and capture functionally related differential expression of miRNAs in the disease. Mol Cancer Ther; 10(5); 752–60. ©2011 AACR.


Human Mutation | 2011

Analysis of pathway mutation profiles highlights collaboration between cancer-associated superpathways.

Yunyan Gu; Wenyuan Zhao; Jiguang Xia; Yuannv Zhang; Ruihong Wu; Chenguang Wang; Zheng Guo

The biological interpretation of the complexity of cancer somatic mutation profiles is a major challenge in current cancer research. It has been suggested that mutations in multiple genes that participate in different pathways are collaborative in conferring growth advantage to tumor cells. Here, we propose a powerful pathway‐based approach to study the functional collaboration of gene mutations in carcinogenesis. We successfully identify many pairs of significantly comutated pathways for a large‐scale somatic mutation profile of lung adenocarcinoma. We find that the coordinated pathway pairs detected by comutations are also likely to be coaltered by other molecular changes, such as alterations in multifunctional genes in cancer. Then, we cluster comutated pathways into comutated superpathways and show that the derived superpathways also tend to be significantly coaltered by DNA copy number alterations. Our results support the hypothesis that comprehensive cooperation among a few basic functions is required for inducing cancer. The results also suggest biologically plausible models for understanding the heterogeneous mechanisms of cancers. Finally, we suggest an approach to identify candidate cancer genes from the derived comutated pathways. Together, our results provide guidelines to distill the pathway collaboration in carcinogenesis from the complexity of cancer somatic mutation profiles.Hum Mutat 32:1–8, 2011.


PLOS ONE | 2013

Genes Dysregulated to Different Extent or Oppositely in Estrogen Receptor-Positive and Estrogen Receptor-Negative Breast Cancers

Xianxiao Zhou; Tongwei Shi; Bailiang Li; Yuannv Zhang; Xiaopei Shen; Hongdong Li; Guini Hong; Chunyang Liu; Zheng Guo

Background Directly comparing gene expression profiles of estrogen receptor-positive (ER+) and estrogen receptor-negative (ER−) breast cancers cannot determine whether differentially expressed genes between these two subtypes result from dysregulated expression in ER+ cancer or ER− cancer versus normal controls, and thus would miss critical information for elucidating the transcriptomic difference between the two subtypes. Principal Findings Using microarray datasets from TCGA, we classified the genes dysregulated in both ER+ and ER− cancers versus normal controls into two classes: (i) genes dysregulated in the same direction but to a different extent, and (ii) genes dysregulated to opposite directions, and then validated the two classes in RNA-sequencing datasets of independent cohorts. We showed that the genes dysregulated to a larger extent in ER+ cancers than in ER− cancers enriched in glycerophospholipid and polysaccharide metabolic processes, while the genes dysregulated to a larger extent in ER− cancers than in ER+ cancers enriched in cell proliferation. Phosphorylase kinase and enzymes of glycosylphosphatidylinositol (GPI) anchor biosynthesis were upregulated to a larger extent in ER+ cancers than in ER− cancers, whereas glycogen synthase and phospholipase A2 were downregulated to a larger extent in ER+ cancers than in ER− cancers. We also found that the genes oppositely dysregulated in the two subtypes significantly enriched with known cancer genes and tended to closely collaborate with the cancer genes. Furthermore, we showed the possibility that these oppositely dysregulated genes could contribute to carcinogenesis of ER+ and ER− cancers through rewiring different subpathways. Conclusions GPI-anchor biosynthesis and glycogenolysis were elevated and hydrolysis of phospholipids was depleted to a larger extent in ER+ cancers than in ER− cancers. Our findings indicate that the genes oppositely dysregulated in the two subtypes are potential cancer genes which could contribute to carcinogenesis of both ER+ and ER− cancers through rewiring different subpathways.


Omics A Journal of Integrative Biology | 2009

Evaluation of cDNA Microarray Data by Multiple Clones Mapping to the Same Transcript

D. Wang; Chenguang Wang; Lin Zhang; Hui Xiao; Xiaopei Shen; Liping Ren; Wenyuan Zhao; Guini Hong; Yuannv Zhang; Jing Zhu; Min Zhang; Da Yang; Wencai Ma; Zheng Guo

Although novel technologies are rapidly emerging, the cDNA microarray data accumulated is still and will be an important source for bioinformatics and biological studies. Thus, the reliability and applicability of the cDNA microarray data warrants further evaluation. In cDNA microarrays, multiple clones are measured for a transcript, which can be exploited to evaluate the consistency of microarray data. We show that even for pairs of RCs, the average Pearson correlation coefficient of their measurements is not high. However, this low consistency could largely be explained by random noise signals for a fraction of unexpressed genes and/or low signal-to-noise ratios for low abundance transcripts. Encouragingly, a large fraction of inconsistent data will be filtered out in the procedure of selecting differentially expressed genes (DEGs). Therefore, although cDNA microarray data are of low consistency, applications based on DEGs selections could still reach correct biological results, especially at the functional modules level.


PLOS ONE | 2013

Pitfalls in Experimental Designs for Characterizing the Transcriptional, Methylational and Copy Number Changes of Oncogenes and Tumor Suppressor Genes

Yuannv Zhang; Jiguang Xia; Yujing Zhang; Yao Qin; Da Yang; Lishuang Qi; Wenyuan Zhao; Chenguang Wang; Zheng Guo

Background It is a common practice that researchers collect a set of samples without discriminating the mutants and their wild-type counterparts to characterize the transcriptional, methylational and/or copy number changes of pre-defined candidate oncogenes or tumor suppressor genes (TSGs), although some examples are known that carcinogenic mutants may express and function completely differently from their wild-type counterparts. Principal Findings Based on various high-throughput data without mutation information for typical cancer types, we surprisingly found that about half of known oncogenes (or TSGs) pre-defined by mutations were down-regulated (or up-regulated) and hypermethylated (or hypomethylated) in their corresponding cancer types. Therefore, the overall expression and/or methylation changes of genes detected in a set of samples without discriminating the mutants and their wild-type counterparts cannot indicate the carcinogenic roles of the mutants. We also found that about half of known oncogenes were located in deletion regions, whereas all known TSGs were located in deletion regions. Thus, both oncogenes and TSGs may be located in deletion regions and thus deletions can indicate TSGs only if the gene is found to be deleted as a whole. In contrast, amplifications are restricted to oncogenes and thus can be used to support either the dysregulated wild-type gene or its mutant as an oncogene. Conclusions We demonstrated that using the transcriptional, methylational and/or copy number changes without mutation information to characterize oncogenes and TSGs, which is a currently still widely adopted strategy, will most often produce misleading results. Our analysis highlights the importance of evaluating expression, methylation and copy number changes together with gene mutation data in the same set of samples in order to determine the distinct roles of the mutants and their wild-type counterparts.

Collaboration


Dive into the Yuannv Zhang's collaboration.

Top Co-Authors

Avatar

Zheng Guo

Fujian Medical University

View shared research outputs
Top Co-Authors

Avatar

Chenguang Wang

Harbin Medical University

View shared research outputs
Top Co-Authors

Avatar

Wenyuan Zhao

Harbin Medical University

View shared research outputs
Top Co-Authors

Avatar

Yunyan Gu

Harbin Medical University

View shared research outputs
Top Co-Authors

Avatar

Ruihong Wu

Harbin Medical University

View shared research outputs
Top Co-Authors

Avatar

D. Wang

Harbin Medical University

View shared research outputs
Top Co-Authors

Avatar

Pengfei Li

Harbin Medical University

View shared research outputs
Top Co-Authors

Avatar

Lixin Cheng

The Chinese University of Hong Kong

View shared research outputs
Top Co-Authors

Avatar

Yujing Zhang

Harbin Medical University

View shared research outputs
Top Co-Authors

Avatar

Bin Li

Harbin Medical University

View shared research outputs
Researchain Logo
Decentralizing Knowledge