Anna Goldenberg | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Anna Goldenberg is active.

Explore More

Publication

Featured researches published by Anna Goldenberg.

Nature Methods | 2014

Similarity network fusion for aggregating data types on a genomic scale

Bo Wang; Aziz M. Mezlini; Feyyaz Demir; Marc Fiume; Zhuowen Tu; Michael Brudno; Benjamin Haibe-Kains; Anna Goldenberg

Recent technologies have made it cost-effective to collect diverse types of genome-wide data. Computational methods are needed to combine these data to create a comprehensive view of a given disease or a biological process. Similarity network fusion (SNF) solves this problem by constructing networks of samples (e.g., patients) for each available data type and then efficiently fusing these into one network that represents the full spectrum of underlying data. For example, to create a comprehensive view of a disease given a cohort of patients, SNF computes and fuses patient similarity networks obtained from each of their data types separately, taking advantage of the complementarity in the data. We used SNF to combine mRNA expression, DNA methylation and microRNA (miRNA) expression data for five cancer data sets. SNF substantially outperforms single data type analysis and established integrative approaches when identifying cancer subtypes and is effective for predicting survival.

Proceedings of the National Academy of Sciences of the United States of America | 2002

Early statistical detection of anthrax outbreaks by tracking over-the-counter medication sales

Anna Goldenberg; Galit Shmueli; Richard A. Caruana; Stephen E. Fienberg

The recent series of anthrax attacks has reinforced the importance of biosurveillance systems for the timely detection of epidemics. This paper describes a statistical framework for monitoring grocery data to detect a large-scale but localized bioterrorism attack. Our system illustrates the potential of data sources that may be more timely than traditional medical and public health data. The system includes several layers, each customized to grocery data and tuned to finding footprints of an epidemic. We also propose an evaluation methodology that is suitable in the absence of data on large-scale bioterrorist attacks and disease outbreaks.

Genome Research | 2013

iReckon: Simultaneous isoform discovery and abundance estimation from RNA-seq data

Aziz M. Mezlini; Eric Smith; Marc Fiume; Orion J. Buske; Gleb L. Savich; Sohrab P. Shah; Sam Aparicio; Derek Y. Chiang; Anna Goldenberg; Michael Brudno

High-throughput RNA sequencing (RNA-seq) promises to revolutionize our understanding of genes and their role in human disease by characterizing the RNA content of tissues and cells. The realization of this promise, however, is conditional on the development of effective computational methods for the identification and quantification of transcripts from incomplete and noisy data. In this article, we introduce iReckon, a method for simultaneous determination of the isoforms and estimation of their abundances. Our probabilistic approach incorporates multiple biological and technical phenomena, including novel isoforms, intron retention, unspliced pre-mRNA, PCR amplification biases, and multimapped reads. iReckon utilizes regularized expectation-maximization to accurately estimate the abundances of known and novel isoforms. Our results on simulated and real data demonstrate a superior ability to discover novel isoforms with a significantly reduced number of false-positive predictions, and our abundance accuracy prediction outmatches that of other state-of-the-art tools. Furthermore, we have applied iReckon to two cancer transcriptome data sets, a triple-negative breast cancer patient sample and the MCF7 breast cancer cell line, and show that iReckon is able to reconstruct the complex splicing changes that were not previously identified. QT-PCR validations of the isoforms detected in the MCF7 cell line confirmed all of iReckons predictions and also showed strong agreement (r(2) = 0.94) with the predicted abundances.

Cancer Cell | 2017

Intertumoral Heterogeneity within Medulloblastoma Subgroups

Florence M.G. Cavalli; Marc Remke; Ladislav Rampasek; John Peacock; David Shih; Betty Luu; Livia Garzia; Jonathon Torchia; Carolina Nör; A. Sorana Morrissy; Sameer Agnihotri; Yuan Yao Thompson; Claudia M. Kuzan-Fischer; Hamza Farooq; Keren Isaev; Craig Daniels; Byung Kyu Cho; Seung Ki Kim; Kyu Chang Wang; Ji Yeoun Lee; Wieslawa A. Grajkowska; Marta Perek-Polnik; Alexandre Vasiljevic; Cécile Faure-Conter; Anne Jouvet; Caterina Giannini; Amulya A. Nageswara Rao; Kay Ka Wai Li; Ho Keung Ng; Charles G. Eberhart

While molecular subgrouping has revolutionized medulloblastoma classification, the extent of heterogeneity within subgroups is unknown. Similarity network fusion (SNF) applied to genome-wide DNA methylation and gene expression data across 763 primary samples identifies very homogeneous clusters of patients, supporting the presence of medulloblastoma subtypes. After integration of somatic copy-number alterations, and clinical features specific to each cluster, we identify 12 different subtypes of medulloblastoma. Integrative analysis using SNF further delineates group 3 from group 4 medulloblastoma, which is not as readily apparent through analyses of individual data types. Two clear subtypes of infants with Sonic Hedgehog medulloblastoma with disparate outcomes and biology are identified. Medulloblastoma subtypes identified through integrative clustering have important implications for stratification of future clinical trials.

international conference on machine learning | 2004

Tractable learning of large Bayes net structures from sparse data

Anna Goldenberg; Andrew W. Moore

This paper addresses three questions. Is it useful to attempt to learn a Bayesian network structure with hundreds of thousands of nodes? How should such structure search proceed practically? The third question arises out of our approach to the second: how can Frequent Sets (Agrawal et al., 1993), which are extremely popular in the area of descriptive data mining, be turned into a probabilistic model?Large sparse datasets with hundreds of thousands of records and attributes appear in social networks, warehousing, supermarket transactions and web logs. The complexity of structural search made learning of factored probabilistic models on such datasets unfeasible. We propose to use Frequent Sets to significantly speed up the structural search. Unlike previous approaches, we not only cache n-way sufficient statistics, but also exploit their local structure. We also present an empirical evaluation of our algorithm applied to several massive datasets.

Cell systems | 2016

TensorFlow: Biology’s Gateway to Deep Learning?

Ladislav Rampasek; Anna Goldenberg

TensorFlow is Googles recently released open-source software for deep learning. What are its applications for computational biology?

Archive | 2007

Statistical Network Analysis: Models, Issues, and New Directions

Edoardo M. Airoldi; David M. Blei; Stephen E. Fienberg; Anna Goldenberg; Eric P. Xing; Alice X. Zheng

Invited Presentations.- Structural Inference of Hierarchies in Networks.- Heider vs Simmel: Emergent Features in Dynamic Structures.- Joint Group and Topic Discovery from Relations and Text.- Statistical Models for Networks: A Brief Review of Some Recent Research.- Other Presentations.- Combining Stochastic Block Models and Mixed Membership for Statistical Network Analysis.- Exploratory Study of a New Model for Evolving Networks.- A Latent Space Model for Rank Data.- A Simple Model for Complex Networks with Arbitrary Degree Distribution and Clustering.- Discrete Temporal Models of Social Networks.- Approximate Kalman Filters for Embedding Author-Word Co-occurrence Data over Time.- Discovering Functional Communities in Dynamical Networks.- Empirical Analysis of a Dynamic Social Network Built from PGP Keyrings.- Extended Abstracts.- A Brief Survey of Machine Learning Methods for Classification in Networked Data and an Application to Suspicion Scoring.- Age and Geographic Inferences of the LiveJournal Social Network.- Inferring Organizational Titles in Online Communication.- Learning Approximate MRFs from Large Transactional Data.- Panel Discussion.- Panel Discussion.

Bioinformatics | 2016

PharmacoGx: an R package for analysis of large pharmacogenomic datasets

Petr Smirnov; Zhaleh Safikhani; Nehme El-Hachem; Dong Wang; Adrian She; Catharina Olsen; Mark Freeman; Heather Selby; Deena M.A. Gendoo; Patrick Grossmann; Andrew H. Beck; Hugo J.W.L. Aerts; Mathieu Lupien; Anna Goldenberg; Benjamin Haibe-Kains

UNLABELLED Pharmacogenomics holds great promise for the development of biomarkers of drug response and the design of new therapeutic options, which are key challenges in precision medicine. However, such data are scattered and lack standards for efficient access and analysis, consequently preventing the realization of the full potential of pharmacogenomics. To address these issues, we implemented PharmacoGx, an easy-to-use, open source package for integrative analysis of multiple pharmacogenomic datasets. We demonstrate the utility of our package in comparing large drug sensitivity datasets, such as the Genomics of Drug Sensitivity in Cancer and the Cancer Cell Line Encyclopedia. Moreover, we show how to use our package to easily perform Connectivity Map analysis. With increasing availability of drug-related data, our package will open new avenues of research for meta-analysis of pharmacogenomic data. AVAILABILITY AND IMPLEMENTATION PharmacoGx is implemented in R and can be easily installed on any system. The package is available from CRAN and its source code is available from GitHub. CONTACT [email protected] or [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

IEEE Intelligent Systems | 2015

Subtyping: What It is and Its Role in Precision Medicine

Suchi Saria; Anna Goldenberg

Precision medicine is an emerging approach that considers variability in genes, environment, and lifestyle in order to better treat individuals. This article gives an overview of the diverse approaches to subtyping, from early accounts based on clinical practice to more recent approaches that focus on computationally derived subtypes based on molecular and electronic health record (EHR) data. This field is expansive and growing rapidly; the authors juxtapose approaches taken by different communities and highlight examples of significant open computational problems.

Bioinformatics | 2014

A probabilistic approach to explore human miRNA targetome by integrating miRNA-overexpression data and sequence information

Yue Li; Anna Goldenberg; Ka-Chun Wong; Zhaolei Zhang

MOTIVATION Systematic identification of microRNA (miRNA) targets remains a challenge. The miRNA overexpression coupled with genome-wide expression profiling is a promising new approach and calls for a new method that integrates expression and sequence information. RESULTS We developed a probabilistic scoring method called targetScore. TargetScore infers miRNA targets as the transformed fold-changes weighted by the Bayesian posteriors given observed target features. To this end, we compiled 84 datasets from Gene Expression Omnibus corresponding to 77 human tissue or cells and 113 distinct transfected miRNAs. Comparing with other methods, targetScore achieves significantly higher accuracy in identifying known targets in most tests. Moreover, the confidence targets from targetScore exhibit comparable protein downregulation and are more significantly enriched for Gene Ontology terms. Using targetScore, we explored oncomir-oncogenes network and predicted several potential cancer-related miRNA-messenger RNA interactions. AVAILABILITY AND IMPLEMENTATION TargetScore is available at Bioconductor: http://www.bioconductor.org/packages/devel/bioc/html/TargetScore.html.

Explore More