Anna Goldenberg
University of Toronto
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Anna Goldenberg.
Nature Methods | 2014
Bo Wang; Aziz M. Mezlini; Feyyaz Demir; Marc Fiume; Zhuowen Tu; Michael Brudno; Benjamin Haibe-Kains; Anna Goldenberg
Recent technologies have made it cost-effective to collect diverse types of genome-wide data. Computational methods are needed to combine these data to create a comprehensive view of a given disease or a biological process. Similarity network fusion (SNF) solves this problem by constructing networks of samples (e.g., patients) for each available data type and then efficiently fusing these into one network that represents the full spectrum of underlying data. For example, to create a comprehensive view of a disease given a cohort of patients, SNF computes and fuses patient similarity networks obtained from each of their data types separately, taking advantage of the complementarity in the data. We used SNF to combine mRNA expression, DNA methylation and microRNA (miRNA) expression data for five cancer data sets. SNF substantially outperforms single data type analysis and established integrative approaches when identifying cancer subtypes and is effective for predicting survival.
Proceedings of the National Academy of Sciences of the United States of America | 2002
Anna Goldenberg; Galit Shmueli; Richard A. Caruana; Stephen E. Fienberg
The recent series of anthrax attacks has reinforced the importance of biosurveillance systems for the timely detection of epidemics. This paper describes a statistical framework for monitoring grocery data to detect a large-scale but localized bioterrorism attack. Our system illustrates the potential of data sources that may be more timely than traditional medical and public health data. The system includes several layers, each customized to grocery data and tuned to finding footprints of an epidemic. We also propose an evaluation methodology that is suitable in the absence of data on large-scale bioterrorist attacks and disease outbreaks.
Genome Research | 2013
Aziz M. Mezlini; Eric Smith; Marc Fiume; Orion J. Buske; Gleb L. Savich; Sohrab P. Shah; Sam Aparicio; Derek Y. Chiang; Anna Goldenberg; Michael Brudno
High-throughput RNA sequencing (RNA-seq) promises to revolutionize our understanding of genes and their role in human disease by characterizing the RNA content of tissues and cells. The realization of this promise, however, is conditional on the development of effective computational methods for the identification and quantification of transcripts from incomplete and noisy data. In this article, we introduce iReckon, a method for simultaneous determination of the isoforms and estimation of their abundances. Our probabilistic approach incorporates multiple biological and technical phenomena, including novel isoforms, intron retention, unspliced pre-mRNA, PCR amplification biases, and multimapped reads. iReckon utilizes regularized expectation-maximization to accurately estimate the abundances of known and novel isoforms. Our results on simulated and real data demonstrate a superior ability to discover novel isoforms with a significantly reduced number of false-positive predictions, and our abundance accuracy prediction outmatches that of other state-of-the-art tools. Furthermore, we have applied iReckon to two cancer transcriptome data sets, a triple-negative breast cancer patient sample and the MCF7 breast cancer cell line, and show that iReckon is able to reconstruct the complex splicing changes that were not previously identified. QT-PCR validations of the isoforms detected in the MCF7 cell line confirmed all of iReckons predictions and also showed strong agreement (r(2) = 0.94) with the predicted abundances.
Cancer Cell | 2017
Florence M.G. Cavalli; Marc Remke; Ladislav Rampasek; John Peacock; David Shih; Betty Luu; Livia Garzia; Jonathon Torchia; Carolina Nör; A. Sorana Morrissy; Sameer Agnihotri; Yuan Yao Thompson; Claudia M. Kuzan-Fischer; Hamza Farooq; Keren Isaev; Craig Daniels; Byung Kyu Cho; Seung Ki Kim; Kyu Chang Wang; Ji Yeoun Lee; Wieslawa A. Grajkowska; Marta Perek-Polnik; Alexandre Vasiljevic; Cécile Faure-Conter; Anne Jouvet; Caterina Giannini; Amulya A. Nageswara Rao; Kay Ka Wai Li; Ho Keung Ng; Charles G. Eberhart
While molecular subgrouping has revolutionized medulloblastoma classification, the extent of heterogeneity within subgroups is unknown. Similarity network fusion (SNF) applied to genome-wide DNA methylation and gene expression data across 763 primary samples identifies very homogeneous clusters of patients, supporting the presence of medulloblastoma subtypes. After integration of somatic copy-number alterations, and clinical features specific to each cluster, we identify 12 different subtypes of medulloblastoma. Integrative analysis using SNF further delineates group 3 from group 4 medulloblastoma, which is not as readily apparent through analyses of individual data types. Two clear subtypes of infants with Sonic Hedgehog medulloblastoma with disparate outcomes and biology are identified. Medulloblastoma subtypes identified through integrative clustering have important implications for stratification of future clinical trials.
international conference on machine learning | 2004
Anna Goldenberg; Andrew W. Moore
This paper addresses three questions. Is it useful to attempt to learn a Bayesian network structure with hundreds of thousands of nodes? How should such structure search proceed practically? The third question arises out of our approach to the second: how can Frequent Sets (Agrawal et al., 1993), which are extremely popular in the area of descriptive data mining, be turned into a probabilistic model?Large sparse datasets with hundreds of thousands of records and attributes appear in social networks, warehousing, supermarket transactions and web logs. The complexity of structural search made learning of factored probabilistic models on such datasets unfeasible. We propose to use Frequent Sets to significantly speed up the structural search. Unlike previous approaches, we not only cache n-way sufficient statistics, but also exploit their local structure. We also present an empirical evaluation of our algorithm applied to several massive datasets.
Cell systems | 2016
Ladislav Rampasek; Anna Goldenberg
TensorFlow is Googles recently released open-source software for deep learning. What are its applications for computational biology?
Archive | 2007
Edoardo M. Airoldi; David M. Blei; Stephen E. Fienberg; Anna Goldenberg; Eric P. Xing; Alice X. Zheng
Invited Presentations.- Structural Inference of Hierarchies in Networks.- Heider vs Simmel: Emergent Features in Dynamic Structures.- Joint Group and Topic Discovery from Relations and Text.- Statistical Models for Networks: A Brief Review of Some Recent Research.- Other Presentations.- Combining Stochastic Block Models and Mixed Membership for Statistical Network Analysis.- Exploratory Study of a New Model for Evolving Networks.- A Latent Space Model for Rank Data.- A Simple Model for Complex Networks with Arbitrary Degree Distribution and Clustering.- Discrete Temporal Models of Social Networks.- Approximate Kalman Filters for Embedding Author-Word Co-occurrence Data over Time.- Discovering Functional Communities in Dynamical Networks.- Empirical Analysis of a Dynamic Social Network Built from PGP Keyrings.- Extended Abstracts.- A Brief Survey of Machine Learning Methods for Classification in Networked Data and an Application to Suspicion Scoring.- Age and Geographic Inferences of the LiveJournal Social Network.- Inferring Organizational Titles in Online Communication.- Learning Approximate MRFs from Large Transactional Data.- Panel Discussion.- Panel Discussion.
Bioinformatics | 2016
Petr Smirnov; Zhaleh Safikhani; Nehme El-Hachem; Dong Wang; Adrian She; Catharina Olsen; Mark Freeman; Heather Selby; Deena M.A. Gendoo; Patrick Grossmann; Andrew H. Beck; Hugo J.W.L. Aerts; Mathieu Lupien; Anna Goldenberg; Benjamin Haibe-Kains
UNLABELLED Pharmacogenomics holds great promise for the development of biomarkers of drug response and the design of new therapeutic options, which are key challenges in precision medicine. However, such data are scattered and lack standards for efficient access and analysis, consequently preventing the realization of the full potential of pharmacogenomics. To address these issues, we implemented PharmacoGx, an easy-to-use, open source package for integrative analysis of multiple pharmacogenomic datasets. We demonstrate the utility of our package in comparing large drug sensitivity datasets, such as the Genomics of Drug Sensitivity in Cancer and the Cancer Cell Line Encyclopedia. Moreover, we show how to use our package to easily perform Connectivity Map analysis. With increasing availability of drug-related data, our package will open new avenues of research for meta-analysis of pharmacogenomic data. AVAILABILITY AND IMPLEMENTATION PharmacoGx is implemented in R and can be easily installed on any system. The package is available from CRAN and its source code is available from GitHub. CONTACT [email protected] or [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
IEEE Intelligent Systems | 2015
Suchi Saria; Anna Goldenberg
Precision medicine is an emerging approach that considers variability in genes, environment, and lifestyle in order to better treat individuals. This article gives an overview of the diverse approaches to subtyping, from early accounts based on clinical practice to more recent approaches that focus on computationally derived subtypes based on molecular and electronic health record (EHR) data. This field is expansive and growing rapidly; the authors juxtapose approaches taken by different communities and highlight examples of significant open computational problems.
Bioinformatics | 2014
Yue Li; Anna Goldenberg; Ka-Chun Wong; Zhaolei Zhang
MOTIVATION Systematic identification of microRNA (miRNA) targets remains a challenge. The miRNA overexpression coupled with genome-wide expression profiling is a promising new approach and calls for a new method that integrates expression and sequence information. RESULTS We developed a probabilistic scoring method called targetScore. TargetScore infers miRNA targets as the transformed fold-changes weighted by the Bayesian posteriors given observed target features. To this end, we compiled 84 datasets from Gene Expression Omnibus corresponding to 77 human tissue or cells and 113 distinct transfected miRNAs. Comparing with other methods, targetScore achieves significantly higher accuracy in identifying known targets in most tests. Moreover, the confidence targets from targetScore exhibit comparable protein downregulation and are more significantly enriched for Gene Ontology terms. Using targetScore, we explored oncomir-oncogenes network and predicted several potential cancer-related miRNA-messenger RNA interactions. AVAILABILITY AND IMPLEMENTATION TargetScore is available at Bioconductor: http://www.bioconductor.org/packages/devel/bioc/html/TargetScore.html.