Fabio Vandin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Fabio Vandin is active.

Explore More

Publication

Featured researches published by Fabio Vandin.

Nature | 2013

Mutational landscape and significance across 12 major cancer types

Cyriac Kandoth; Michael D. McLellan; Fabio Vandin; Kai Ye; Beifang Niu; Charles Lu; Mingchao Xie; Qunyuan Zhang; Joshua F. McMichael; Matthew A. Wyczalkowski; Mark D. M. Leiserson; Christopher A. Miller; John S. Welch; Matthew J. Walter; Michael C. Wendl; Timothy J. Ley; Richard Wilson; Benjamin J. Raphael; Li Ding

The Cancer Genome Atlas (TCGA) has used the latest sequencing and analysis methods to identify somatic variants across thousands of tumours. Here we present data and analytical results for point mutations and small insertions/deletions from 3,281 tumours across 12 tumour types as part of the TCGA Pan-Cancer effort. We illustrate the distributions of mutation frequencies, types and contexts across tumour types, and establish their links to tissues of origin, environmental/carcinogen influences, and DNA repair defects. Using the integrated data sets, we identified 127 significantly mutated genes from well-known (for example, mitogen-activated protein kinase, phosphatidylinositol-3-OH kinase, Wnt/β-catenin and receptor tyrosine kinase signalling pathways, and cell cycle control) and emerging (for example, histone, histone modification, splicing, metabolism and proteolysis) cellular processes in cancer. The average number of mutations in these significantly mutated genes varies across tumour types; most tumours have two to six, indicating that the number of driver mutations required during oncogenesis is relatively small. Mutations in transcriptional factors/regulators show tissue specificity, whereas histone modifiers are often mutated across several cancer types. Clinical association analysis identifies genes having a significant effect on survival, and investigations of mutations with respect to clonal/subclonal architecture delineate their temporal orders during tumorigenesis. Taken together, these results lay the groundwork for developing new diagnostics and individualizing cancer treatment.

Nature | 2012

The mutational landscape of lethal castration-resistant prostate cancer

Catherine S. Grasso; Yi Mi Wu; Dan R. Robinson; Xuhong Cao; Saravana M. Dhanasekaran; Amjad P. Khan; Michael J. Quist; Xiaojun Jing; Robert J. Lonigro; J. Chad Brenner; Irfan A. Asangani; Bushra Ateeq; Sang Y. Chun; Javed Siddiqui; Lee Sam; Matt Anstett; Rohit Mehra; John R. Prensner; Nallasivam Palanisamy; Gregory A Ryslik; Fabio Vandin; Benjamin J. Raphael; Lakshmi P. Kunju; Daniel R. Rhodes; Kenneth J. Pienta; Arul M. Chinnaiyan; Scott A. Tomlins

Characterization of the prostate cancer transcriptome and genome has identified chromosomal rearrangements and copy number gains and losses, including ETS gene family fusions, PTEN loss and androgen receptor (AR) amplification, which drive prostate cancer development and progression to lethal, metastatic castration-resistant prostate cancer (CRPC). However, less is known about the role of mutations. Here we sequenced the exomes of 50 lethal, heavily pre-treated metastatic CRPCs obtained at rapid autopsy (including three different foci from the same patient) and 11 treatment-naive, high-grade localized prostate cancers. We identified low overall mutation rates even in heavily treated CRPCs (2.00 per megabase) and confirmed the monoclonal origin of lethal CRPC. Integrating exome copy number analysis identified disruptions of CHD1 that define a subtype of ETS gene family fusion-negative prostate cancer. Similarly, we demonstrate that ETS2, which is deleted in approximately one-third of CRPCs (commonly through TMPRSS2:ERG fusions), is also deregulated through mutation. Furthermore, we identified recurrent mutations in multiple chromatin- and histone-modifying genes, including MLL2 (mutated in 8.6% of prostate cancers), and demonstrate interaction of the MLL complex with the AR, which is required for AR-mediated signalling. We also identified novel recurrent mutations in the AR collaborating factor FOXA1, which is mutated in 5 of 147 (3.4%) prostate cancers (both untreated localized prostate cancer and CRPC), and showed that mutated FOXA1 represses androgen signalling and increases tumour growth. Proteins that physically interact with the AR, such as the ERG gene fusion product, FOXA1, MLL2, UTX (also known as KDM6A) and ASXL1 were found to be mutated in CRPC. In summary, we describe the mutational landscape of a heavily treated metastatic cancer, identify novel mechanisms of AR signalling deregulated in prostate cancer, and prioritize candidates for future study.

Nature Genetics | 2015

Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes

Mark D. M. Leiserson; Fabio Vandin; Hsin-Ta Wu; Jason R. Dobson; Jonathan V Eldridge; Jacob L Thomas; Alexandra Papoutsaki; Younhun Kim; Beifang Niu; Michael D. McLellan; Michael S. Lawrence; Abel Gonzalez-Perez; David Tamborero; Yuwei Cheng; Gregory A Ryslik; Nuria Lopez-Bigas; Gad Getz; Li Ding; Benjamin J. Raphael

Cancers exhibit extensive mutational heterogeneity, and the resulting long-tail phenomenon complicates the discovery of genes and pathways that are significantly mutated in cancer. We perform a pan-cancer analysis of mutated networks in 3,281 samples from 12 cancer types from The Cancer Genome Atlas (TCGA) using HotNet2, a new algorithm to find mutated subnetworks that overcomes the limitations of existing single-gene, pathway and network approaches. We identify 16 significantly mutated subnetworks that comprise well-known cancer signaling pathways as well as subnetworks with less characterized roles in cancer, including cohesin, condensin and others. Many of these subnetworks exhibit co-occurring mutations across samples. These subnetworks contain dozens of genes with rare somatic mutations across multiple cancers; many of these genes have additional evidence supporting a role in cancer. By illuminating these rare combinations of mutations, pan-cancer network analyses provide a roadmap to investigate new diagnostic and therapeutic opportunities across cancer types.

Genome Medicine | 2014

Identifying driver mutations in sequenced cancer genomes: computational approaches to enable precision medicine

Benjamin J. Raphael; Jason R. Dobson; Layla Oesper; Fabio Vandin

High-throughput DNA sequencing is revolutionizing the study of cancer and enabling the measurement of the somatic mutations that drive cancer development. However, the resulting sequencing datasets are large and complex, obscuring the clinically important mutations in a background of errors, noise, and random mutations. Here, we review computational approaches to identify somatic mutations in cancer genome sequences and to distinguish the driver mutations that are responsible for cancer from random, passenger mutations. First, we describe approaches to detect somatic mutations from high-throughput DNA sequencing data, particularly for tumor samples that comprise heterogeneous populations of cells. Next, we review computational approaches that aim to predict driver mutations according to their frequency of occurrence in a cohort of samples, or according to their predicted functional impact on protein sequence or structure. Finally, we review techniques to identify recurrent combinations of somatic mutations, including approaches that examine mutations in known pathways or protein-interaction networks, as well as de novo approaches that identify combinations of mutations according to statistical patterns of mutual exclusivity. These techniques, coupled with advances in high-throughput DNA sequencing, are enabling precision medicine approaches to the diagnosis and treatment of cancer.

Genome Biology | 2015

CoMEt: a statistical approach to identify combinations of mutually exclusive alterations in cancer

Mark D. M. Leiserson; Hsin-Ta Wu; Fabio Vandin; Benjamin J. Raphael

Cancer is a heterogeneous disease with different combinations of genetic alterations driving its development in different individuals. We introduce CoMEt, an algorithm to identify combinations of alterations that exhibit a pattern of mutual exclusivity across individuals, often observed for alterations in the same pathway. CoMEt includes an exact statistical test for mutual exclusivity and techniques to perform simultaneous analysis of multiple sets of mutually exclusive and subtype-specific alterations. We demonstrate that CoMEt outperforms existing approaches on simulated and real data. We apply CoMEt to five different cancer types, identifying both known cancer genes and pathways, and novel putative cancer genes.

pacific symposium on biocomputing | 2011

Discovery of mutated subnetworks associated with clinical data in cancer.

Fabio Vandin; Patrick Clay; Eli Upfal; Benjamin J. Raphael

A major goal of cancer sequencing projects is to identify genetic alterations that determine clinical phenotypes, such as survival time or drug response. Somatic mutations in cancer are typically very diverse, and are found in different sets of genes in different patients. This mutational heterogeneity complicates the discovery of associations between individual mutations and a clinical phenotype. This mutational heterogeneity is explained in part by the fact that driver mutations, the somatic mutations that drive cancer development, target genes in cellular pathways, and only a subset of pathway genes is mutated in a given patient. Thus, pathway-based analysis of associations between mutations and phenotype are warranted. Here, we introduce an algorithm to find groups of genes, or pathways, whose mutational status is associated to a clinical phenotype without prior definition of the pathways. Rather, we find subnetworks of genes in an gene interaction network with the property that the mutational status of the genes in the subnetwork are significantly associated with a clinical phenotype. This new algorithm is built upon HotNet, an algorithm that finds groups of mutated genes using a heat diffusion model and a two-stage statistical test. We focus here on discovery of statistically significant correlations between mutated subnetworks and patient survival data. A similar approach can be used for correlations with other types of clinical data, through use of an appropriate statistical test. We apply our method to simulated data as well as to mutation and survival data from ovarian cancer samples from The Cancer Genome Atlas. In the TCGA data, we discover nine subnetworks containing genes whose mutational status is correlated with survival. Genes in four of these subnetworks overlap known pathways, including the focal adhesion and cell adhesion pathways, while other subnetworks are novel.

Data Mining and Knowledge Discovery | 2010

Mining top-K frequent itemsets through progressive sampling

Andrea Pietracaprina; Matteo Riondato; Eli Upfal; Fabio Vandin

We study the use of sampling for efficiently mining the top-K frequent itemsets of cardinality at most w. To this purpose, we define an approximation to the top-K frequent itemsets to be a family of itemsets which includes (resp., excludes) all very frequent (resp., very infrequent) itemsets, together with an estimate of these itemsets’ frequencies with a bounded error. Our first result is an upper bound on the sample size which guarantees that the top-K frequent itemsets mined from a random sample of that size approximate the actual top-K frequent itemsets, with probability larger than a specified value. We show that the upper bound is asymptotically tight when w is constant. Our main algorithmic contribution is a progressive sampling approach, combined with suitable stopping conditions, which on appropriate inputs is able to extract approximate top-K frequent itemsets from samples whose sizes are smaller than the general upper bound. In order to test the stopping conditions, this approach maintains the frequency of all itemsets encountered, which is practical only for small w. However, we show how this problem can be mitigated by using a variation of Bloom filters. A number of experiments conducted on both synthetic and real benchmark datasets show that using samples substantially smaller than the original dataset (i.e., of size defined by the upper bound or reached through the progressive sampling approach) enable to approximate the actual top-K frequent itemsets with accuracy much higher than what analytically proved.

Briefings in Bioinformatics | 2016

Computational pan-genomics: status, promises and challenges

Tobias Marschall; Manja Marz; Thomas Abeel; Louis J. Dijkstra; Bas E. Dutilh; Ali Ghaffaari; Paul J. Kersey; Wigard P. Kloosterman; Veli Mäkinen; Adam M. Novak; Benedict Paten; David Porubsky; Eric Rivals; Can Alkan; Jasmijn A. Baaijens; Paul I. W. de Bakker; Valentina Boeva; Raoul J. P. Bonnal; Francesca Chiaromonte; Rayan Chikhi; Francesca D. Ciccarelli; Robin Cijvat; Erwin Datema; Cornelia M. van Duijn; Evan E. Eichler; Corinna Ernst; Eleazar Eskin; Erik Garrison; Mohammed El-Kebir; Gunnar W. Klau

Abstract Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different computational methods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this article, we generalize existing definitions and understand a pan-genome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pan-genomics can help address many of the problems currently faced in various domains.Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different computational methods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this article, we generalize existing definitions and understand a pan-genome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pan-genomics can help address many of the problems currently faced in various domains.

symposium on principles of database systems | 2009

An efficient rigorous approach for identifying statistically significant frequent itemsets

Adam Kirsch; Michael Mitzenmacher; Andrea Pietracaprina; Geppino Pucci; Eli Upfal; Fabio Vandin

As advances in technology allow for the collection, storage, and analysis of vast amounts of data, the task of screening and assessing the significance of discovered patterns is becoming a major challenge in data mining applications. In this work, we address significance in the context of frequent itemset mining. Specifically, we develop a novel methodology to identify a meaningful support threshold s* for a dataset, such that the number of itemsets with support at least s* represents a substantial deviation from what would be expected in a random dataset with the same number of transactions and the same individual item frequencies. These itemsets can then be flagged as statistically significant with a small false discovery rate. Our methodology hinges on a Poisson approximation to the distribution of the number of itemsets in a random dataset with support at least s, for any s greater than or equal to a minimum threshold smin. We obtain this result through a novel application of the Chen-Stein approximation method, which is of independent interest. Based on this approximation, we develop an efficient parametric multi-hypothesis test for identifying the desired threshold s*. A crucial feature of our approach is that, unlike most previous work, it takes into account the entire dataset rather than individual discoveries. It is therefore better able to distinguish between significant observations and random fluctuations. We present extensive experimental results to substantiate the effectiveness of our methodology.

discovery science | 2007

Efficient incremental mining of top-K frequent closed itemsets

Andrea Pietracaprina; Fabio Vandin

In this work we study the mining of top-K frequent closed itemsets, a recently proposed variant of the classical problem of mining frequent closed itemsets where the support threshold is chosen as the maximum value sufficient to guarantee that the itemsets returned in output be at least K. We discuss the effectiveness of parameter K in controlling the output size and develop an efficient algorithm for mining top-K frequent closed itemsets in order of decreasing support, which exhibits consistently better performance than the best previously known one, attaining substantial improvements in some cases. A distinctive feature of our algorithm is that it allows the user to dynamically raise the value K with no need to restart the computation from scratch.

Explore More