Is this you? Create Your Porfile

Andrew D. Rouillard

Icahn School of Medicine at Mount Sinai

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Andrew D. Rouillard is active.

Explore More

Publication

Featured researches published by Andrew D. Rouillard.

Nucleic Acids Research | 2016

Enrichr: a comprehensive gene set enrichment analysis web server 2016 update

Maxim V. Kuleshov; Matthew R. Jones; Andrew D. Rouillard; Nicolas F. Fernandez; Qiaonan Duan; Zichen Wang; Simon Koplev; Sherry L. Jenkins; Kathleen M. Jagodnik; Alexander Lachmann; Michael G. McDermott; Caroline D. Monteiro; Gregory W. Gundersen; Avi Ma'ayan

Enrichment analysis is a popular method for analyzing gene sets generated by genome-wide experiments. Here we present a significant update to one of the tools in this domain called Enrichr. Enrichr currently contains a large collection of diverse gene set libraries available for analysis and download. In total, Enrichr currently contains 180 184 annotated gene sets from 102 gene set libraries. New features have been added to Enrichr including the ability to submit fuzzy sets, upload BED files, improved application programming interface and visualization of the results as clustergrams. Overall, Enrichr is a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries. Enrichr is freely available at: http://amp.pharm.mssm.edu/Enrichr.

Nucleic Acids Research | 2014

LINCS Canvas Browser: interactive web app to query, browse and interrogate LINCS L1000 gene expression signatures

Qiaonan Duan; Corey Flynn; Mario Niepel; Marc Hafner; Jeremy L. Muhlich; Nicolas F. Fernandez; Andrew D. Rouillard; Christopher M. Tan; Edward Y. Chen; Todd R. Golub; Peter K. Sorger; Aravind Subramanian; Avi Ma'ayan

For the Library of Integrated Network-based Cellular Signatures (LINCS) project many gene expression signatures using the L1000 technology have been produced. The L1000 technology is a cost-effective method to profile gene expression in large scale. LINCS Canvas Browser (LCB) is an interactive HTML5 web-based software application that facilitates querying, browsing and interrogating many of the currently available LINCS L1000 data. LCB implements two compacted layered canvases, one to visualize clustered L1000 expression data, and the other to display enrichment analysis results using 30 different gene set libraries. Clicking on an experimental condition highlights gene-sets enriched for the differentially expressed genes from the selected experiment. A search interface allows users to input gene lists and query them against over 100 000 conditions to find the top matching experiments. The tool integrates many resources for an unprecedented potential for new discoveries in systems biology and systems pharmacology. The LCB application is available at http://www.maayanlab.net/LINCS/LCB. Customized versions will be made part of the http://lincscloud.org and http://lincs.hms.harvard.edu websites.

Database | 2016

The harmonizome: a collection of processed datasets gathered to serve and mine knowledge about genes and proteins

Andrew D. Rouillard; Gregory W. Gundersen; Nicolas F. Fernandez; Zichen Wang; Caroline D. Monteiro; Michael G. McDermott; Avi Ma’ayan

Genomics, epigenomics, transcriptomics, proteomics and metabolomics efforts rapidly generate a plethora of data on the activity and levels of biomolecules within mammalian cells. At the same time, curation projects that organize knowledge from the biomedical literature into online databases are expanding. Hence, there is a wealth of information about genes, proteins and their associations, with an urgent need for data integration to achieve better knowledge extraction and data reuse. For this purpose, we developed the Harmonizome: a collection of processed datasets gathered to serve and mine knowledge about genes and proteins from over 70 major online resources. We extracted, abstracted and organized data into ∼72 million functional associations between genes/proteins and their attributes. Such attributes could be physical relationships with other biomolecules, expression in cell lines and tissues, genetic associations with knockout mouse or human phenotypes, or changes in expression after drug treatment. We stored these associations in a relational database along with rich metadata for the genes/proteins, their attributes and the original resources. The freely available Harmonizome web portal provides a graphical user interface, a web service and a mobile app for querying, browsing and downloading all of the collected data. To demonstrate the utility of the Harmonizome, we computed and visualized gene–gene and attribute–attribute similarity networks, and through unsupervised clustering, identified many unexpected relationships by combining pairs of datasets such as the association between kinase perturbations and disease signatures. We also applied supervised machine learning methods to predict novel substrates for kinases, endogenous ligands for G-protein coupled receptors, mouse phenotypes for knockout genes, and classified unannotated transmembrane proteins for likelihood of being ion channels. The Harmonizome is a comprehensive resource of knowledge about genes and proteins, and as such, it enables researchers to discover novel relationships between biological entities, as well as form novel data-driven hypotheses for experimental validation. Database URL: http://amp.pharm.mssm.edu/Harmonizome.

Nature Communications | 2016

Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd.

Zichen Wang; Caroline D. Monteiro; Kathleen M. Jagodnik; Nicolas F. Fernandez; Gregory W. Gundersen; Andrew D. Rouillard; Sherry L. Jenkins; Axel S Feldmann; Kevin Hu; Michael G. McDermott; Qiaonan Duan; Neil R. Clark; Matthew R. Jones; Yan Kou; Troy Goff; Holly Woodland; Fabio M R. Amaral; Gregory L. Szeto; Oliver Fuchs; Sophia Miryam Schüssler-Fiorenza Rose; Shvetank Sharma; Uwe Schwartz; Xabier Bengoetxea Bausela; Maciej Szymkiewicz; Vasileios Maroulis; Anton Salykin; Carolina M. Barra; Candice D. Kruth; Nicholas J. Bongio; Vaibhav Mathur

Gene expression data are accumulating exponentially in public repositories. Reanalysis and integration of themed collections from these studies may provide new insights, but requires further human curation. Here we report a crowdsourcing project to annotate and reanalyse a large number of gene expression profiles from Gene Expression Omnibus (GEO). Through a massive open online course on Coursera, over 70 participants from over 25 countries identify and annotate 2,460 single-gene perturbation signatures, 839 disease versus normal signatures, and 906 drug perturbation signatures. All these signatures are unique and are manually validated for quality. Global analysis of these signatures confirms known associations and identifies novel associations between genes, diseases and drugs. The manually curated signatures are used as a training set to develop classifiers for extracting similar signatures from the entire GEO repository. We develop a web portal to serve these signatures for query, download and visualization.

npj Systems Biology and Applications | 2016

L1000CDS2: LINCS L1000 characteristic direction signatures search engine

Qiaonan Duan; St. Patrick Reid; Neil R. Clark; Zichen Wang; Nicolas F. Fernandez; Andrew D. Rouillard; Ben Readhead; Sarah R. Tritsch; Rachel Hodos; Marc Hafner; Mario Niepel; Peter K. Sorger; Joel T. Dudley; Sina Bavari; Rekha G. Panchal; Avi Ma’ayan

The library of integrated network-based cellular signatures (LINCS) L1000 data set currently comprises of over a million gene expression profiles of chemically perturbed human cell lines. Through unique several intrinsic and extrinsic benchmarking schemes, we demonstrate that processing the L1000 data with the characteristic direction (CD) method significantly improves signal to noise compared with the MODZ method currently used to compute L1000 signatures. The CD processed L1000 signatures are served through a state-of-the-art web-based search engine application called L1000CDS2. The L1000CDS2 search engine provides prioritization of thousands of small-molecule signatures, and their pairwise combinations, predicted to either mimic or reverse an input gene expression signature using two methods. The L1000CDS2 search engine also predicts drug targets for all the small molecules profiled by the L1000 assay that we processed. Targets are predicted by computing the cosine similarity between the L1000 small-molecule signatures and a large collection of signatures extracted from the gene expression omnibus (GEO) for single-gene perturbations in mammalian cells. We applied L1000CDS2 to prioritize small molecules that are predicted to reverse expression in 670 disease signatures also extracted from GEO, and prioritized small molecules that can mimic expression of 22 endogenous ligand signatures profiled by the L1000 assay. As a case study, to further demonstrate the utility of L1000CDS2, we collected expression signatures from human cells infected with Ebola virus at 30, 60 and 120 min. Querying these signatures with L1000CDS2 we identified kenpaullone, a GSK3B/CDK2 inhibitor that we show, in subsequent experiments, has a dose-dependent efficacy in inhibiting Ebola infection in vitro without causing cellular toxicity in human cell lines. In summary, the L1000CDS2 tool can be applied in many biological and biomedical settings, while improving the extraction of knowledge from the LINCS L1000 resource.

Nucleic Acids Research | 2017

Pharos: Collating protein information to shed light on the druggable genome

Dac-Trung Nguyen; Stephen L. Mathias; Cristian G. Bologa; Søren Brunak; Nicolas F. Fernandez; Anna Gaulton; Anne Hersey; Jayme Holmes; Lars Juhl Jensen; Anneli Karlsson; Guixia Liu; Avi Ma'ayan; Geetha Mandava; Subramani Mani; Saurabh Mehta; John P. Overington; Juhee Patel; Andrew D. Rouillard; Stephan C. Schürer; Timothy Sheils; Anton Simeonov; Larry A. Sklar; Noel Southall; Oleg Ursu; Dušica Vidovic; Anna Waller; Jeremy J. Yang; Ajit Jadhav; Tudor I. Oprea; Rajarshi Guha

The ‘druggable genome’ encompasses several protein families, but only a subset of targets within them have attracted significant research attention and thus have information about them publicly available. The Illuminating the Druggable Genome (IDG) program was initiated in 2014, has the goal of developing experimental techniques and a Knowledge Management Center (KMC) that would collect and organize information about protein targets from four families, representing the most common druggable targets with an emphasis on understudied proteins. Here, we describe two resources developed by the KMC: the Target Central Resource Database (TCRD) which collates many heterogeneous gene/protein datasets and Pharos (https://pharos.nih.gov), a multimodal web interface that presents the data from TCRD. We briefly describe the types and sources of data considered by the KMC and then highlight features of the Pharos interface designed to enable intuitive access to the IDG knowledgebase. The aim of Pharos is to encourage ‘serendipitous browsing’, whereby related, relevant information is made easily discoverable. We conclude by describing two use cases that highlight the utility of Pharos and TCRD.

Bioinformatics | 2015

GEO2Enrichr: browser extension and server app to extract gene sets from GEO and analyze them for biological functions

Gregory W. Gundersen; Matthew R. Jones; Andrew D. Rouillard; Yan Kou; Caroline D. Monteiro; Axel S Feldmann; Kevin Hu; Avi Ma’ayan

MOTIVATION Identification of differentially expressed genes is an important step in extracting knowledge from gene expression profiling studies. The raw expression data from microarray and other high-throughput technologies is deposited into the Gene Expression Omnibus (GEO) and served as Simple Omnibus Format in Text (SOFT) files. However, to extract and analyze differentially expressed genes from GEO requires significant computational skills. RESULTS Here we introduce GEO2Enrichr, a browser extension for extracting differentially expressed gene sets from GEO and analyzing those sets with Enrichr, an independent gene set enrichment analysis tool containing over 70 000 annotated gene sets organized into 75 gene-set libraries. GEO2Enrichr adds JavaScript code to GEO web-pages; this code scrapes user selected accession numbers and metadata, and then, with one click, users can submit this information to a web-server application that downloads the SOFT files, parses, cleans and normalizes the data, identifies the differentially expressed genes, and then pipes the resulting gene lists to Enrichr for downstream functional analysis. GEO2Enrichr opens a new avenue for adding functionality to major bioinformatics resources such GEO by integrating tools and resources without the need for a plug-in architecture. Importantly, GEO2Enrichr helps researchers to quickly explore hypotheses with little technical overhead, lowering the barrier of entry for biologists by automating data processing steps needed for knowledge extraction from the major repository GEO. AVAILABILITY AND IMPLEMENTATION GEO2Enrichr is an open source tool, freely available for installation as browser extensions at the Chrome Web Store and FireFox Add-ons. Documentation and a browser independent web application can be found at http://amp.pharm.mssm.edu/g2e/. CONTACT [email protected].

Computational Biology and Chemistry | 2015

Reprint of Abstraction for data integration

Andrew D. Rouillard; Zichen Wang; Avi Ma'ayan

With advances in genomics, transcriptomics, metabolomics and proteomics, and more expansive electronic clinical record monitoring, as well as advances in computation, we have entered the Big Data era in biomedical research. Data gathering is growing rapidly while only a small fraction of this data is converted to useful knowledge or reused in future studies. To improve this, an important concept that is often overlooked is data abstraction. To fuse and reuse biomedical datasets from diverse resources, data abstraction is frequently required. Here we summarize some of the major Big Data biomedical research resources for genomics, proteomics and phenotype data, collected from mammalian cells, tissues and organisms. We then suggest simple data abstraction methods for fusing this diverse but related data. Finally, we demonstrate examples of the potential utility of such data integration efforts, while warning about the inherit biases that exist within such data.

Bioinformatics | 2014

Drug/Cell-line Browser: interactive canvas visualization of cancer drug/cell-line viability assay datasets

Qiaonan Duan; Zichen Wang; Nicolas F. Fernandez; Andrew D. Rouillard; Christopher M. Tan; Cyril H. Benes; Avi Ma'ayan

SUMMARY Recently, several high profile studies collected cell viability data from panels of cancer cell lines treated with many drugs applied at different concentrations. Such drug sensitivity data for cancer cell lines provide suggestive treatments for different types and subtypes of cancer. Visualization of these datasets can reveal patterns that may not be obvious by examining the data without such efforts. Here we introduce Drug/Cell-line Browser (DCB), an online interactive HTML5 data visualization tool for interacting with three of the recently published datasets of cancer cell lines/drug-viability studies. DCB uses clustering and canvas visualization of the drugs and the cell lines, as well as a bar graph that summarizes drug effectiveness for the tissue of origin or the cancer subtypes for single or multiple drugs. DCB can help in understanding drug response patterns and prioritizing drug/cancer cell line interactions by tissue of origin or cancer subtype. AVAILABILITY AND IMPLEMENTATION DCB is an open source Web-based tool that is freely available at: http://www.maayanlab.net/LINCS/DCB CONTACT: [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

Cancer Research | 2015

Abstract B1-25: Data integration for illuminating the druggable genome

Andrew D. Rouillard; Avi Ma'ayan

Introduction: The druggable genome is defined as a set of roughly 3000 genes encoding proteins for which drugs can be readily designed to modify their activity. Nearly half of the genes of the druggable genome are G protein-coupled receptors (GPCRs), ion channels, protein kinases, or nuclear receptors. To date, the majority of the genes/proteins from these four families have been scarcely investigated and therefore little is known about their molecular interactions, biological functions, and roles in diseases. Illuminating the Druggable Genome (IDG) is an NIH Common Fund project that is supporting experimental and computational efforts to investigate under-studied druggable genome targets. Our objective is to perform computational analyses on existing biological data to prioritize understudied targets based on their likely disease relevance. Methods: We collected and organized information from a diverse set of 23 resources about GPCRs, ion channels, protein kinases, and nuclear receptors. These resources cover four categories of information: curated physical interactions (e.g. kinase-substrate phosphorylations), literature-derived annotations (e.g. Gene Ontology terms), physical properties (e.g. structural domains), and data from large scale profiling studies (e.g. gene expression profiles of cancer cell lines and other disease models). We distilled the information from each resource into attribute tables where the rows are targets and the columns their attributes. This enabled us to build target and attribute networks for unsupervised learning, and cross validate and combine evidence from diverse resources for supervised learning. We integrated kinase similarity matrices using regularized logistic regression to predict phosphorylation reactions between kinases. We also used gene expression data within The Cancer Genome Atlas (TCGA) to perform enrichment analysis against several gene signatures for druggable genome targets, including: signatures of differentially expressed genes (DEG) following target knockdown, signatures of DEG following target over-expression, and signatures of proteins reported to interact with targets. We used an unsupervised approach to obtain consensus scores for the relative likelihood of a target to be dysregulated in a tumor sample, and then performed hierarchical clustering to match groups of cancer patients with potential novel targets. Results: We computed receiver operating characteristic (ROC) curves to assess the performance of kinase similarity matrices as predictors of kinase-kinase phosphorylation reactions. We investigated the performance of the following similarity matrices individually and after integration using cross-validated, regularized logistic regression: similarity of DEG after kinase knockdown, phylogenetic similarity, similarity of kinase knockout phenotypes, similarity of curated annotations, and similarity of curated interactions. We found that although the predictive performance of each individual similarity matrix varied, with areas under the ROC curve ranging from 0.57 to 0.82, integrating similarity matrices improved the predictive performance up to 0.86. The predicted kinome network can be used to prioritize under-studied kinases by searching for connections between kinases known to play a role in cancer or other diseases. Using an unsupervised approach to identify potentially dysregulated targets in patient tumor samples, we found that within a cancer type (e.g. Acute Myeloid Leukemia), clusters of patients emerged with distinct sets of enriched targets. Under-studied genes within these sets are potentially novel targets that may be investigated further to confirm disease relevance and therapeutic potential. Conclusions: The attribute tables, similarity matrices, and data integration pipelines we developed will enable prioritization of targets for cancer and other diseases. Citation Format: Andrew D. Rouillard, Avi Ma9ayan. Data integration for illuminating the druggable genome. [abstract]. In: Proceedings of the AACR Special Conference on Computational and Systems Biology of Cancer; Feb 8-11 2015; San Francisco, CA. Philadelphia (PA): AACR; Cancer Res 2015;75(22 Suppl 2):Abstract nr B1-25.

Explore More