Daniele Ramazzotti
Stanford University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Daniele Ramazzotti.
Nature Methods | 2017
Bo Wang; Junjie Zhu; Emma Pierson; Daniele Ramazzotti; Serafim Batzoglou
We present single-cell interpretation via multikernel learning (SIMLR), an analytic framework and software which learns a similarity measure from single-cell RNA-seq data in order to perform dimension reduction, clustering and visualization. On seven published data sets, we benchmark SIMLR against state-of-the-art methods. We show that SIMLR is scalable and greatly enhances clustering performance while improving the visualization and interpretability of single-cell sequencing data.
PLOS ONE | 2014
Loes M. Olde Loohuis; Giulio Caravagna; Alex Graudenzi; Daniele Ramazzotti; Giancarlo Mauri; Marco Antoniotti; Bud Mishra
Existing techniques to reconstruct tree models of progression for accumulative processes, such as cancer, seek to estimate causation by combining correlation and a frequentist notion of temporal priority. In this paper, we define a novel theoretical framework called CAPRESE (CAncer PRogression Extraction with Single Edges) to reconstruct such models based on the notion of probabilistic causation defined by Suppes. We consider a general reconstruction setting complicated by the presence of noise in the data due to biological variation, as well as experimental or measurement errors. To improve tolerance to noise we define and use a shrinkage-like estimator. We prove the correctness of our algorithm by showing asymptotic convergence to the correct tree under mild constraints on the level of noise. Moreover, on synthetic data, we show that our approach outperforms the state-of-the-art, that it is efficient even with a relatively small number of samples and that its performance quickly converges to its asymptote as the number of samples increases. For real cancer datasets obtained with different technologies, we highlight biologically significant differences in the progressions inferred with respect to other competing techniques and we also show how to validate conjectured biological relations with progression models.
Proceedings of the National Academy of Sciences of the United States of America | 2016
Giulio Caravagna; Alex Graudenzi; Daniele Ramazzotti; Rebeca Sanz-Pamplona; Luca De Sano; Giancarlo Mauri; Victor Moreno; Marco Antoniotti; Bud Mishra
Significance A causality-based machine learning Pipeline for Cancer Inference (PiCnIc) is introduced to infer the underlying somatic evolution of ensembles of tumors from next-generation sequencing data. PiCnIc combines techniques for sample stratification, driver selection, and identification of fitness-equivalent exclusive alterations to exploit an algorithm based on Suppes’ probabilistic causation. The accuracy and translational significance of the results are studied in detail, with an application to colorectal cancer. The PiCnIc pipeline has been made publicly accessible for reproducibility, interoperability, and future enhancements. The genomic evolution inherent to cancer relates directly to a renewed focus on the voluminous next-generation sequencing data and machine learning for the inference of explanatory models of how the (epi)genomic events are choreographed in cancer initiation and development. However, despite the increasing availability of multiple additional -omics data, this quest has been frustrated by various theoretical and technical hurdles, mostly stemming from the dramatic heterogeneity of the disease. In this paper, we build on our recent work on the “selective advantage” relation among driver mutations in cancer progression and investigate its applicability to the modeling problem at the population level. Here, we introduce PiCnIc (Pipeline for Cancer Inference), a versatile, modular, and customizable pipeline to extract ensemble-level progression models from cross-sectional sequenced cancer genomes. The pipeline has many translational implications because it combines state-of-the-art techniques for sample stratification, driver selection, identification of fitness-equivalent exclusive alterations, and progression model inference. We demonstrate PiCnIc’s ability to reproduce much of the current knowledge on colorectal cancer progression as well as to suggest novel experimentally verifiable hypotheses.
Evolutionary Bioinformatics | 2018
Daniele Ramazzotti; Alex Graudenzi; Giulio Caravagna; Marco Antoniotti
Several diseases related to cell proliferation are characterized by the accumulation of somatic DNA changes, with respect to wild-type conditions. Cancer and HIV are 2 common examples of such diseases, where the mutational load in the cancerous/viral population increases over time. In these cases, selective pressures are often observed along with competition, co-operation, and parasitism among distinct cellular clones. Recently, we presented a mathematical framework to model these phenomena, based on a combination of Bayesian inference and Suppes’ theory of probabilistic causation, depicted in graphical structures dubbed Suppes-Bayes Causal Networks (SBCNs). The SBCNs are generative probabilistic graphical models that recapitulate the potential ordering of accumulation of such DNA changes during the progression of the disease. Such models can be inferred from data by exploiting likelihood-based model selection strategies with regularization. In this article, we discuss the theoretical foundations of our approach and we investigate in depth the influence on the model selection task of (1) the poset based on Suppes’ theory and (2) different regularization strategies. Furthermore, we provide an example of application of our framework to HIV genetic data highlighting the valuable insights provided by the inferred SBCN
Bioinformatics | 2016
Luca De Sano; Giulio Caravagna; Daniele Ramazzotti; Alex Graudenzi; Giancarlo Mauri; Bud Mishra; Marco Antoniotti
MOTIVATION We introduce TRanslational ONCOlogy (TRONCO), an open-source R package that implements the state-of-the-art algorithms for the inference of cancer progression models from (epi)genomic mutational profiles. TRONCO can be used to extract population-level models describing the trends of accumulation of alterations in a cohort of cross-sectional samples, e.g. retrieved from publicly available databases, and individual-level models that reveal the clonal evolutionary history in single cancer patients, when multiple samples, e.g. multiple biopsies or single-cell sequencing data, are available. The resulting models can provide key hints for uncovering the evolutionary trajectories of cancer, especially for precision medicine or personalized therapy. AVAILABILITY AND IMPLEMENTATION TRONCO is released under the GPL license, is hosted at http://bimib.disco.unimib.it/ (Software section) and archived also at bioconductor.org. CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
international conference on conceptual structures | 2017
Gelin Gao; Bud Mishra; Daniele Ramazzotti
Abstract The most recent financial upheavals have cast doubt on the adequacy of some of the conventional quantitative risk management strategies, such as VaR (Value at Risk), in many common situations. Consequently, there has been an increasing need for verisimilar financial stress testings, namely simulating and analyzing financial portfolios in extreme, albeit rare scenarios. Unlike conventional risk management which exploits statistical correlations among financial instruments, here we focus our analysis on the notion of probabilistic causation, which is embodied by Suppes-Bayes Causal Networks (SBCNs), SBCNs are probabilistic graphical models that have many attractive features in terms of more accurate causal analysis for generating financial stress scenarios. In this paper, we present a novel approach for conducting stress testing of financial portfolios based on SBCNs in combination with classical machine learning classification tools. The resulting method is shown to be capable of correctly discovering the causal relationships among financial factors that affect the portfolios and thus, simulating stress testing scenarios with a higher accuracy and lower computational complexity than conventional Monte Carlo Simulations.
Scientific Reports | 2017
Rocco Piazza; Daniele Ramazzotti; Roberta Spinelli; Alessandra Pirola; Luca De Sano; Pierangelo Ferrari; Vera Magistroni; Nicoletta Cordani; Nitesh Sharma; Carlo Gambacorti-Passerini
The complicated, evolving landscape of cancer mutations poses a formidable challenge to identify cancer genes among the large lists of mutations typically generated in NGS experiments. The ability to prioritize these variants is therefore of paramount importance. To address this issue we developed OncoScore, a text-mining tool that ranks genes according to their association with cancer, based on available biomedical literature. Receiver operating characteristic curve and the area under the curve (AUC) metrics on manually curated datasets confirmed the excellent discriminating capability of OncoScore (OncoScore cut-off threshold = 21.09; AUC = 90.3%, 95% CI: 88.1–92.5%), indicating that OncoScore provides useful results in cases where an efficient prioritization of cancer-associated genes is needed.The complicated, evolving landscape of cancer mutations poses a formidable challenge to identify cancer genes among the large lists of mutations typically generated in NGS experiments. The ability to prioritize these variants is therefore of paramount importance. To address this issue we developed OncoScore, a text-mining tool that ranks genes according to their association with cancer, based on available biomedical literature. Receiver operating characteristic curve and the area under the curve (AUC) metrics on manually curated datasets confirmed the excellent discriminating capability of OncoScore (OncoScore cut-off threshold = 21.09; AUC = 90.3%, 95% CI: 88.1-92.5%), indicating that OncoScore provides useful results in cases where an efficient prioritization of cancer-associated genes is needed.
bioRxiv | 2013
L Olde Loohuis; Giulio Caravagna; Alex Graudenzi; Daniele Ramazzotti; Giancarlo Mauri; Marco Antoniotti; Bud Mishra
Existing techniques to reconstruct tree models of progression for accumulative processes such as cancer, seek to estimate causation by combining correlation and a frequentist notion of temporal priority. In this paper we define a novel theoretical framework to reconstruct such models based on the probabilistic notion of causation defined by Suppes, which differ fundamentally from that based on correlation. We consider a general reconstruction setting complicated by the presence of noise in the data, owing to the intrinsic variability of biological processes as well as experimental or measurement errors. To gain immunity to noise in the reconstruction performance we use a shrinkage estimator. On synthetic data, we show that our approach outperforms the state-of-the-art and, for some real cancer datasets, we highlight biologically significant differences revealed by the reconstructed progressions. Finally, we show that our method is efficient even with a relatively low number of samples and its performance quickly converges to its asymptote as the number of samples increases. Our analysis suggests the applicability of the method on small datasets of real patients.Existing techniques to reconstruct tree models of progression for accumulative processes, such as cancer, seek to estimate causation by combining correlation and a frequentist notion of temporal priority. In this paper, we define a novel theoretical framework called CAPRESE (CAncer PRogression Extraction with Single Edges) to reconstruct such models based on the notion of probabilistic causation defined by Suppes. We consider a general reconstruction setting complicated by the presence of noise in the data due to biological variation, as well as experimental or measurement errors. To improve tolerance to noise we define and use a shrinkage-like estimator. We prove the correctness of our algorithm by showing asymptotic convergence to the correct tree under mild constraints on the level of noise. Moreover, on synthetic data, we show that our approach outperforms the state-of-the-art, that it is efficient even with a relatively small number of samples and that its performance quickly converges to its asymptote as the number of samples increases. For real cancer datasets obtained with different technologies, we highlight biologically significant differences in the progressions inferred with respect to other competing techniques and we also show how to validate conjectured biological relations with progression models.
Proteomics | 2018
Bo Wang; Daniele Ramazzotti; Luca De Sano; Junjie Zhu; Emma Pierson; Serafim Batzoglou
SIMLR (Single‐cell Interpretation via Multi‐kernel LeaRning), an open‐source tool that implements a novel framework to learn a sample‐to‐sample similarity measure from expression data observed for heterogenous samples, is presented here. SIMLR can be effectively used to perform tasks such as dimension reduction, clustering, and visualization of heterogeneous populations of samples. SIMLR was benchmarked against state‐of‐the‐art methods for these three tasks on several public datasets, showing it to be scalable and capable of greatly improving clustering performance, as well as providing valuable insights by making the data more interpretable via better a visualization. SIMLR is available on https://github.com/BatzoglouLabSU/SIMLRGitHub in both R and MATLAB implementations. Furthermore, it is also available as an R package on http://bioconductor.org
bioRxiv | 2017
Bo Wang; Daniele Ramazzotti; Luca De Sano; Junjie Zhu; Emma Pierson; Serafim Batzoglou
Motivation We here present SIMLR (Single-cell Interpretation via Multi-kernel LeaRning), an open-source tool that implements a novel framework to learn a cell-to-cell similarity measure from single-cell RNA-seq data. SIMLR can be effectively used to perform tasks such as dimension reduction, clustering, and visualization of heterogeneous populations of cells. SIMLR was benchmarked against state-of-the-art methods for these three tasks on several public datasets, showing it to be scalable and capable of greatly improving clustering performance, as well as providing valuable insights by making the data more interpretable via better a visualization. Availability and Implementation SIMLR is available on GitHub in both R and MATLAB implementations. Furthermore, it is also available as an R package on bioconductor.org. Contact [email protected] or [email protected] Supplementary Information Supplementary data are available at Bioinformatics online.