Yinyin Yuan
Institute of Cancer Research
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yinyin Yuan.
Nature | 2012
Christina Curtis; Sohrab P. Shah; Suet-Feung Chin; Gulisa Turashvili; Oscar M. Rueda; Mark J. Dunning; Doug Speed; Andy G. Lynch; Shamith Samarajiwa; Yinyin Yuan; Stefan Gräf; Gavin Ha; Gholamreza Haffari; Ali Bashashati; Roslin Russell; Steven McKinney; Anita Langerød; Andrew T. Green; Elena Provenzano; G.C. Wishart; Sarah Pinder; Peter H. Watson; Florian Markowetz; Leigh Murphy; Ian O. Ellis; Arnie Purushotham; Anne Lise Børresen-Dale; James D. Brenton; Simon Tavaré; Carlos Caldas
The elucidation of breast cancer subgroups and their molecular drivers requires integrated views of the genome and transcriptome from representative numbers of patients. We present an integrated analysis of copy number and gene expression in a discovery and validation set of 997 and 995 primary breast tumours, respectively, with long-term clinical follow-up. Inherited variants (copy number variants and single nucleotide polymorphisms) and acquired somatic copy number aberrations (CNAs) were associated with expression in ∼40% of genes, with the landscape dominated by cis- and trans-acting CNAs. By delineating expression outlier genes driven in cis by CNAs, we identified putative cancer genes, including deletions in PPP2R2A, MTAP and MAP2K4. Unsupervised analysis of paired DNA–RNA profiles revealed novel subgroups with distinct clinical outcomes, which reproduced in the validation cohort. These include a high-risk, oestrogen-receptor-positive 11q13/14 cis-acting subgroup and a favourable prognosis subgroup devoid of CNAs. Trans-acting aberration hotspots were found to modulate subgroup-specific gene networks, including a TCR deletion-mediated adaptive immune response in the ‘CNA-devoid’ subgroup and a basal-specific chromosome 5 deletion-associated mitotic network. Our results provide a novel molecular stratification of the breast cancer population, derived from the impact of somatic CNAs on the transcriptome.
Science Translational Medicine | 2012
Yinyin Yuan; Henrik Failmezger; Oscar M. Rueda; H. Raza Ali; Stefan Gräf; Suet Feung Chin; Roland F. Schwarz; Christina Curtis; Mark J. Dunning; Helen Bardwell; Nicola Johnson; Sarah Doyle; Gulisa Turashvili; Elena Provenzano; Sam Aparicio; Carlos Caldas; Florian Markowetz
Image analysis of breast cancer tissue improves and complements genomic data to predict patient survival. Digitizing Pathology for Genomics The tumor microenvironment is a complex milieu that includes not only the cancer cells but also the stromal cells, immune cells, and even normal, healthy cells. Molecular analysis of tumor tissue is therefore a challenging task because all this “extra” genomic information can muddle the results. Conversely, biopsy tissue staining can provide a spatial and cellular readout (architecture and content), but it is mostly qualitative information. In response, Yuan and colleagues have developed a quantitative, computational approach to pathology. When combined with molecular analyses, the authors were able to uncover new knowledge about breast tumor biology and, in turn, predict patient survival. Yuan et al. first collected histopathology images, gene expression data, and DNA copy number variation data for 564 breast cancer patients. Using a portion of the images (the “discovery set”), they developed an image processing approach that automatically classified cells as cancer, lymphocyte, or stroma on the basis of their size and shape. This approach was validated on the remaining samples, and any errors in this analysis were digitally corrected before obtaining a plot of tumor cellular heterogeneity. With exact knowledge of the tumor’s cellular composition, the authors were able to correct copy number data to more accurately reflect HER2 status compared with uncorrected data. Yuan and colleagues combined their digital pathology with genomic information to devise an integrated predictor of survival for estrogen receptor (ER)–negative patients. Higher number of infiltrating lymphocytes (immune cells) as quantified by their image analysis platform were found in a subset of patients with better clinical outcome than the rest of ER-negative patients, and this outcome difference was significantly enhanced with the addition of gene expression. The quantitative and objective nature of this integrated predictor could benefit diagnosis and prognosis in many areas of cancer by using the rich combination of tumor cellular content and genomic data. Solid tumors are heterogeneous tissues composed of a mixture of cancer and normal cells, which complicates the interpretation of their molecular profiles. Furthermore, tissue architecture is generally not reflected in molecular assays, rendering this rich information underused. To address these challenges, we developed a computational approach based on standard hematoxylin and eosin–stained tissue sections and demonstrated its power in a discovery and validation cohort of 323 and 241 breast tumors, respectively. To deconvolute cellular heterogeneity and detect subtle genomic aberrations, we introduced an algorithm based on tumor cellularity to increase the comparability of copy number profiles between samples. We next devised a predictor for survival in estrogen receptor–negative breast cancer that integrated both image-based and gene expression analyses and significantly outperformed classifiers that use single data types, such as microarray expression signatures. Image processing also allowed us to describe and validate an independent prognostic factor based on quantitative analysis of spatial patterns between stromal cells, which are not detectable by molecular assays. Our quantitative, image-based method could benefit any large-scale cancer study by refining and complementing molecular assays of tumor samples.
Nature Communications | 2015
Roland Jäger; Gabriele Migliorini; Marc Henrion; Radhika Kandaswamy; Helen E. Speedy; Andreas Heindl; Nicola Whiffin; Maria J. Carnicer; Laura Broome; Nicola Dryden; Takashi Nagano; Stefan Schoenfelder; Martin Enge; Yinyin Yuan; Jussi Taipale; Peter Fraser; Olivia Fletcher; Richard S. Houlston
Multiple regulatory elements distant from their targets on the linear genome can influence the expression of a single gene through chromatin looping. Chromosome conformation capture implemented in Hi-C allows for genome-wide agnostic characterization of chromatin contacts. However, detection of functional enhancer–promoter interactions is precluded by its effective resolution that is determined by both restriction fragmentation and sensitivity of the experiment. Here we develop a capture Hi-C (cHi-C) approach to allow an agnostic characterization of these physical interactions on a genome-wide scale. Single-nucleotide polymorphisms associated with complex diseases often reside within regulatory elements and exert effects through long-range regulation of gene expression. Applying this cHi-C approach to 14 colorectal cancer risk loci allows us to identify key long-range chromatin interactions in cis and trans involving these loci.
Modern Pathology | 2015
Sidra Nawaz; Andreas Heindl; Konrad Koelble; Yinyin Yuan
The abundance of tumor-infiltrating lymphocytes has been associated with a favorable prognosis in estrogen receptor-negative breast cancer. However, a high degree of spatial heterogeneity in lymphocytic infiltration is often observed and its clinical implication remains unclear. Here we combine automated histological image processing with methods of spatial statistics used in ecological data analysis to quantify spatial heterogeneity in the distribution patterns of tumor-infiltrating lymphocytes. Hematoxylin and eosin-stained sections from two cohorts of estrogen receptor-negative breast cancer patients (discovery: n=120; validation: n=125) were processed with our automated cell classification algorithm to identify the location of lymphocytes and cancer cells. Subsequently, hotspot analysis (Getis–Ord Gi*) was applied to identify statistically significant hotspots of cancer and immune cells, defined as tumor regions with a significantly high number of cancer cells or immune cells, respectively. We found that the amount of co-localized cancer and immune hotspots weighted by tumor area, rather than number of cancer or immune hotspots, correlates with a better prognosis in estrogen receptor-negative breast cancer in univariate and multivariate analysis. Moreover, co-localization of cancer and immune hotspots further stratified patients with immune cell-rich tumors. Our study demonstrates the importance of quantifying not only the abundance of lymphocytes but also their spatial variation in the tumor specimen for which methods from other disciplines such as spatial statistics can be successfully applied.
Laboratory Investigation | 2015
Andreas Heindl; Sidra Nawaz; Yinyin Yuan
The emergent field of digital pathology employing automated image analysis techniques is to revolutionize traditional pathology at the center of clinical diagnostics. Histological images provide important tumor features unavailable in molecular profiling or omics data— the spatial context of tumor and stromal cells at single-cell resolution. Methods to map the spatial and morphological patterns of cancer and normal cells can contribute to a more comprehensive understanding of the highly heterogeneous tumor microenvironment. This review focuses on methods that help expand our knowledge of intra-tumoral spatial heterogeneity of the tumor microenvironment and their potential synergies with molecular profiling technologies.
Nature Methods | 2014
Jonathan D. Worboys; John Sinclair; Yinyin Yuan; Claus Jørgensen
In targeted proteomics it is critical that peptides are not only proteotypic but also accurately represent the level of the protein (quantotypic). Numerous approaches are used to identify proteotypic peptides, but quantotypic properties are rarely assessed. We show that measuring ratios of proteotypic peptides across biological samples can be used to empirically identify peptides with good quantotypic properties. We applied this technique to identify quantotypic peptides for 21% of the human kinome.
PLOS ONE | 2011
Yinyin Yuan; Chang Tsun Li; Oliver P. Windram
Inferring regulatory relationships among many genes based on their temporal variation in transcript abundance has been a popular research topic. Due to the nature of microarray experiments, classical tools for time series analysis lose power since the number of variables far exceeds the number of the samples. In this paper, we describe some of the existing multivariate inference techniques that are applicable to hundreds of variables and show the potential challenges for small-sample, large-scale data. We propose a directed partial correlation (DPC) method as an efficient and effective solution to regulatory network inference using these data. Specifically for genomic data, the proposed method is designed to deal with large-scale datasets. It combines the efficiency of partial correlation for setting up network topology by testing conditional independence, and the concept of Granger causality to assess topology change with induced interruptions. The idea is that when a transcription factor is induced artificially within a gene network, the disruption of the network by the induction signifies a genes role in transcriptional regulation. The benchmarking results using GeneNetWeaver, the simulator for the DREAM challenges, provide strong evidence of the outstanding performance of the proposed DPC method. When applied to real biological data, the inferred starch metabolism network in Arabidopsis reveals many biologically meaningful network modules worthy of further investigation. These results collectively suggest DPC is a versatile tool for genomics research. The R package DPC is available for download (http://code.google.com/p/dpcnet/).
PLOS Computational Biology | 2011
Yinyin Yuan; Richard S. Savage; Florian Markowetz
Different data types can offer complementary perspectives on the same biological phenomenon. In cancer studies, for example, data on copy number alterations indicate losses and amplifications of genomic regions in tumours, while transcriptomic data point to the impact of genomic and environmental events on the internal wiring of the cell. Fusing different data provides a more comprehensive model of the cancer cell than that offered by any single type. However, biological signals in different patients exhibit diverse degrees of concordance due to cancer heterogeneity and inherent noise in the measurements. This is a particularly important issue in cancer subtype discovery, where personalised strategies to guide therapy are of vital importance. We present a nonparametric Bayesian model for discovering prognostic cancer subtypes by integrating gene expression and copy number variation data. Our model is constructed from a hierarchy of Dirichlet Processes and addresses three key challenges in data fusion: (i) To separate concordant from discordant signals, (ii) to select informative features, (iii) to estimate the number of disease subtypes. Concordance of signals is assessed individually for each patient, giving us an additional level of insight into the underlying disease structure. We exemplify the power of our model in prostate cancer and breast cancer and show that it outperforms competing methods. In the prostate cancer data, we identify an entirely new subtype with extremely poor survival outcome and show how other analyses fail to detect it. In the breast cancer data, we find subtypes with superior prognostic value by using the concordant results. These discoveries were crucially dependent on our models ability to distinguish concordant and discordant signals within each patient sample, and would otherwise have been missed. We therefore demonstrate the importance of taking a patient-specific approach, using highly-flexible nonparametric Bayesian methods.
Nature Reviews Cancer | 2017
Carlo C. Maley; Athena Aktipis; Trevor A. Graham; Andrea Sottoriva; Amy M. Boddy; Michalina Janiszewska; Ariosto S. Silva; Marco Gerlinger; Yinyin Yuan; Kenneth J. Pienta; Karen S. Anderson; Robert A. Gatenby; Charles Swanton; David Posada; Chung I. Wu; Joshua D. Schiffman; E. Shelley Hwang; Kornelia Polyak; Alexander R. A. Anderson; Joel S. Brown; Mel Greaves; Darryl Shibata
Neoplasms change over time through a process of cell-level evolution, driven by genetic and epigenetic alterations. However, the ecology of the microenvironment of a neoplastic cell determines which changes provide adaptive benefits. There is widespread recognition of the importance of these evolutionary and ecological processes in cancer, but to date, no system has been proposed for drawing clinically relevant distinctions between how different tumours are evolving. On the basis of a consensus conference of experts in the fields of cancer evolution and cancer ecology, we propose a framework for classifying tumours that is based on four relevant components. These are the diversity of neoplastic cells (intratumoural heterogeneity) and changes over time in that diversity, which make up an evolutionary index (Evo-index), as well as the hazards to neoplastic cell survival and the resources available to neoplastic cells, which make up an ecological index (Eco-index). We review evidence demonstrating the importance of each of these factors and describe multiple methods that can be used to measure them. Development of this classification system holds promise for enabling clinicians to personalize optimal interventions based on the evolvability of the patients tumour. The Evo- and Eco-indices provide a common lexicon for communicating about how neoplasms change in response to interventions, with potential implications for clinical trials, personalized medicine and basic cancer research.
PLOS Medicine | 2016
Rachael Natrajan; Heba Sailem; Faraz K. Mardakheh; Mar Arias Garcia; Christopher J. Tape; Mitch Dowsett; Chris Bakal; Yinyin Yuan
Background The intra-tumor diversity of cancer cells is under intense investigation; however, little is known about the heterogeneity of the tumor microenvironment that is key to cancer progression and evolution. We aimed to assess the degree of microenvironmental heterogeneity in breast cancer and correlate this with genomic and clinical parameters. Methods and Findings We developed a quantitative measure of microenvironmental heterogeneity along three spatial dimensions (3-D) in solid tumors, termed the tumor ecosystem diversity index (EDI), using fully automated histology image analysis coupled with statistical measures commonly used in ecology. This measure was compared with disease-specific survival, key mutations, genome-wide copy number, and expression profiling data in a retrospective study of 510 breast cancer patients as a test set and 516 breast cancer patients as an independent validation set. In high-grade (grade 3) breast cancers, we uncovered a striking link between high microenvironmental heterogeneity measured by EDI and a poor prognosis that cannot be explained by tumor size, genomics, or any other data types. However, this association was not observed in low-grade (grade 1 and 2) breast cancers. The prognostic value of EDI was superior to known prognostic factors and was enhanced with the addition of TP53 mutation status (multivariate analysis test set, p = 9 × 10−4, hazard ratio = 1.47, 95% CI 1.17–1.84; validation set, p = 0.0011, hazard ratio = 1.78, 95% CI 1.26–2.52). Integration with genome-wide profiling data identified losses of specific genes on 4p14 and 5q13 that were enriched in grade 3 tumors with high microenvironmental diversity that also substratified patients into poor prognostic groups. Limitations of this study include the number of cell types included in the model, that EDI has prognostic value only in grade 3 tumors, and that our spatial heterogeneity measure was dependent on spatial scale and tumor size. Conclusions To our knowledge, this is the first study to couple unbiased measures of microenvironmental heterogeneity with genomic alterations to predict breast cancer clinical outcome. We propose a clinically relevant role of microenvironmental heterogeneity for advanced breast tumors, and highlight that ecological statistics can be translated into medical advances for identifying a new type of biomarker and, furthermore, for understanding the synergistic interplay of microenvironmental heterogeneity with genomic alterations in cancer cells.