Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Gail Rosen is active.

Publication


Featured researches published by Gail Rosen.


Molecular Ecology | 2012

Environmental and ecological factors that shape the gut bacterial communities of fish: a meta‐analysis

Karen E. Sullam; Steven D. Essinger; Catherine A. Lozupone; Michael P. O’Connor; Gail Rosen; Rob Knight; Susan S. Kilham; Jacob A. Russell

Symbiotic bacteria often help their hosts acquire nutrients from their diet, showing trends of co‐evolution and independent acquisition by hosts from the same trophic levels. While these trends hint at important roles for biotic factors, the effects of the abiotic environment on symbiotic community composition remain comparably understudied. In this investigation, we examined the influence of abiotic and biotic factors on the gut bacterial communities of fish from different taxa, trophic levels and habitats. Phylogenetic and statistical analyses of 25 16S rRNA libraries revealed that salinity, trophic level and possibly host phylogeny shape the composition of fish gut bacteria. When analysed alongside bacterial communities from other environments, fish gut communities typically clustered with gut communities from mammals and insects. Similar consideration of individual phylotypes (vs. communities) revealed evolutionary ties between fish gut microbes and symbionts of animals, as many of the bacteria from the guts of herbivorous fish were closely related to those from mammals. Our results indicate that fish harbour more specialized gut communities than previously recognized. They also highlight a trend of convergent acquisition of similar bacterial communities by fish and mammals, raising the possibility that fish were the first to evolve symbioses resembling those found among extant gut fermenting mammals.


Bioinformatics | 2011

NBC: The Naïve Bayes Classification Tool Webserver for Taxonomic Classification of Metagenomic Reads

Gail Rosen; Erin Reichenberger; Aaron M. Rosenfeld

Motivation: Datasets from high-throughput sequencing technologies have yielded a vast amount of data about organisms in environmental samples. Yet, it is still a challenge to assess the exact organism content in these samples because the task of taxonomic classification is too computationally complex to annotate all reads in a dataset. An easy-to-use webserver is needed to process these reads. While many methods exist, only a few are publicly available on webservers, and out of those, most do not annotate all reads. Results: We introduce a webserver that implements the naïve Bayes classifier (NBC) to classify all metagenomic reads to their best taxonomic match. Results indicate that NBC can assign next-generation sequencing reads to their taxonomic classification and can find significant populations of genera that other classifiers may miss. Availability: Publicly available at: http://nbc.ece.drexel.edu. Contact: [email protected]


Advances in Bioinformatics | 2008

Metagenome Fragment Classification Using N-Mer Frequency Profiles

Gail Rosen; Elaine Garbarine; Diamantino Caseiro; Robi Polikar; Bahrad A. Sokhansanj

A vast amount of microbial sequencing data is being generated through large-scale projects in ecology, agriculture, and human health. Efficient high-throughput methods are needed to analyze the mass amounts of metagenomic data, all DNA present in an environmental sample. A major obstacle in metagenomics is the inability to obtain accuracy using technology that yields short reads. We construct the unique N-mer frequency profiles of 635 microbial genomes publicly available as of February 2008. These profiles are used to train a naive Bayes classifier (NBC) that can be used to identify the genome of any fragment. We show that our method is comparable to BLAST for small 25 bp fragments but does not have the ambiguity of BLASTs tied top scores. We demonstrate that this approach is scalable to identify any fragment from hundreds of genomes. It also performs quite well at the strain, species, and genera levels and achieves strain resolution despite classifying ubiquitous genomic fragments (gene and nongene regions). Cross-validation analysis demonstrates that species-accuracy achieves 90% for highly-represented species containing an average of 8 strains. We demonstrate that such a tool can be used on the Sargasso Sea dataset, and our analysis shows that NBC can be further enhanced.


PLOS ONE | 2012

Using the RDP Classifier to Predict Taxonomic Novelty and Reduce the Search Space for Finding Novel Organisms

Yemin Lan; Qiong Wang; James R. Cole; Gail Rosen

Background Currently, the naïve Bayesian classifier provided by the Ribosomal Database Project (RDP) is one of the most widely used tools to classify 16S rRNA sequences, mainly collected from environmental samples. We show that RDP has 97+% assignment accuracy and is fast for 250 bp and longer reads when the read originates from a taxon known to the database. Because most environmental samples will contain organisms from taxa whose 16S rRNA genes have not been previously sequenced, we aim to benchmark how well the RDP classifier and other competing methods can discriminate these novel taxa from known taxa. Principal Findings Because each fragment is assigned a score (containing likelihood or confidence information such as the boostrap score in the RDP classifier), we “train” a threshold to discriminate between novel and known organisms and observe its performance on a test set. The threshold that we determine tends to be conservative (low sensitivity but high specificity) for naïve Bayesian methods. Nonetheless, our method performs better with the RDP classifier than the other methods tested, measured by the f-measure and the area-under-the-curve on the receiver operating characteristic of the test set. By constraining the database to well-represented genera, sensitivity improves 3–15%. Finally, we show that the detector is a good predictor to determine novel abundant taxa (especially for finer levels of taxonomy where novelty is more likely to be present). Conclusions We conclude that selecting a read-length appropriate RDP bootstrap score can significantly reduce the search space for identifying novel genera and higher levels in taxonomy. In addition, having a well-represented database significantly improves performance while having genera that are “highly” similar does not make a significant improvement. On a real dataset from an Amazon Terra Preta soil sample, we show that the detector can predict (or correlates to) whether novel sequences will be assigned to new taxa when the RDP database “doubles” in the future.


Journal of the Royal Society Interface | 2018

Opportunities and obstacles for deep learning in biology and medicine

Travers Ching; Daniel Himmelstein; Brett K. Beaulieu-Jones; Alexandr A. Kalinin; Brian T. Do; Gregory P. Way; Enrico Ferrero; Paul-Michael Agapow; Michael Zietz; Michael M. Hoffman; Wei Xie; Gail Rosen; Benjamin J. Lengerich; Johnny Israeli; Jack Lanchantin; Stephen Woloszynek; Anne E. Carpenter; Avanti Shrikumar; Jinbo Xu; Evan M. Cofer; Christopher A. Lavender; Srinivas C. Turaga; Amr Alexandari; Zhiyong Lu; David J. Harris; Dave DeCaprio; Yanjun Qi; Anshul Kundaje; Yifan Peng; Laura Wiley

Deep learning describes a class of machine learning algorithms that are capable of combining raw inputs into layers of intermediate features. These algorithms have recently shown impressive results across a variety of domains. Biology and medicine are data-rich disciplines, but the data are complex and often ill-understood. Hence, deep learning techniques may be particularly well suited to solve problems of these fields. We examine applications of deep learning to a variety of biomedical problems—patient classification, fundamental biological processes and treatment of patients—and discuss whether deep learning will be able to transform these tasks or if the biomedical sphere poses unique challenges. Following from an extensive literature review, we find that deep learning has yet to revolutionize biomedicine or definitively resolve any of the most pressing challenges in the field, but promising advances have been made on the prior state of the art. Even though improvements over previous baselines have been modest in general, the recent progress indicates that deep learning methods will provide valuable means for speeding up or aiding human investigation. Though progress has been made linking a specific neural networks prediction to input features, understanding how users should interpret these models to make testable hypotheses about the system under study remains an open challenge. Furthermore, the limited amount of labelled data for training presents problems in some domains, as do legal and privacy constraints on work with sensitive health records. Nonetheless, we foresee deep learning enabling changes at both bench and bedside with the potential to transform several areas of biology and medicine.


Current Genomics | 2009

Signal Processing for Metagenomics: Extracting Information from the Soup

Gail Rosen; Bahrad A. Sokhansanj; Robi Polikar; Mary Ann Bruns; Jacob A. Russell; Elaine Garbarine; Steve Essinger; Non Yok

Traditionally, studies in microbial genomics have focused on single-genomes from cultured species, thereby limiting their focus to the small percentage of species that can be cultured outside their natural environment. Fortunately, recent advances in high-throughput sequencing and computational analyses have ushered in the new field of metagenomics, which aims to decode the genomes of microbes from natural communities without the need for cultivation. Although metagenomic studies have shed a great deal of insight into bacterial diversity and coding capacity, several computational challenges remain due to the massive size and complexity of metagenomic sequence data. Current tools and techniques are reviewed in this paper which address challenges in 1) genomic fragment annotation, 2) phylogenetic reconstruction, 3) functional classification of samples, and 4) interpreting complementary metaproteomics and metametabolomics data. Also surveyed are important applications of metagenomic studies, including microbial forensics and the roles of microbial communities in shaping human health and soil ecology.


Mbio | 2013

Selecting age-related functional characteristics in the human gut microbiome

Yemin Lan; Andres Kriete; Gail Rosen

BackgroundHuman gut microbial functions are often associated with various diseases and host physiologies. Aging, a less explored factor, is also suspected to affect or be affected by microbiome alterations. By combining functional feature selection with supervised classification, we aim to facilitate identification of age-related functional characteristics in metagenomes from several human gut microbiome studies (MetaHIT, MicroAge, MicroObes, Kurokawa et al.’s and Gill et al.’s dataset).ResultsWe apply two feature selection methods, term frequency-inverse document frequency (TF-iDF) and minimum-redundancy maximum-relevancy (mRMR), to identify functional signatures that differentiate metagenomes by age. After features are reduced, we use a support vector machine (SVM) to predict host age of new metagenomes. Functional features are from protein families (Pfams), Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, KEGG ontologies and the Gene Ontology (GO) database. Initial investigations demonstrate that ordination of the functional principal components shows great overlap between different age groups. However, when feature selection is applied, mRMR tightens the ordination cluster for each age group, and TF-iDF offers better linear separation. Both TF-iDF and mRMR were used in conjunction with a SVM classifier and achieved areas under receiver operating characteristic curves (AUCs) 10 to 15% above chance to classify individuals above/below mid-ages (about 38 to 43 years old) using Pfams. Better performance around mid-ages is also observed when using other functional categories and age-balanced dataset. We also identified some age-related Pfams that improved age discrimination at age 65 with another feature selection method called LEfSe, on an age-balanced dataset. The selected functional characteristics identify a broad range of age-relevant metabolisms, such as reduced vitamin B12 synthesis, reduced activity of reductases, increased DNA damage, occurrences of stress responses and immune system compromise, and upregulated glycosyltransferases in the aging population.ConclusionsFeature selection can yield biologically meaningful results when used in conjunction with classification, and makes age classification of new human gut metagenomes feasible. While we demonstrate the promise of this approach, the data-dependent prediction performance could be further improved. We hypothesize that while the Qin et al. dataset is the most comprehensive to date, even deeper sampling is needed to better characterize and predict the microbiomes’ functional content.


Bioinformatics | 2013

Quikr: a Method for Rapid Reconstruction of Bacterial Communities via Compressive Sensing

David Koslicki; Simon Foucart; Gail Rosen

MOTIVATION Many metagenomic studies compare hundreds to thousands of environmental and health-related samples by extracting and sequencing their 16S rRNA amplicons and measuring their similarity using beta-diversity metrics. However, one of the first steps--to classify the operational taxonomic units within the sample--can be a computationally time-consuming task because most methods rely on computing the taxonomic assignment of each individual read out of tens to hundreds of thousands of reads. RESULTS We introduce Quikr: a QUadratic, K-mer-based, Iterative, Reconstruction method, which computes a vector of taxonomic assignments and their proportions in the sample using an optimization technique motivated from the mathematical theory of compressive sensing. On both simulated and actual biological data, we demonstrate that Quikr typically has less error and is typically orders of magnitude faster than the most commonly used taxonomic assignment technique (the Ribosomal Database Projects Naïve Bayesian Classifier). Furthermore, the technique is shown to be unaffected by the presence of chimeras, thereby allowing for the circumvention of the time-intensive step of chimera filtering. AVAILABILITY The Quikr computational package (in MATLAB, Octave, Python and C) for the Linux and Mac platforms is available at http://sourceforge.net/projects/quikr/.


PLOS ONE | 2014

WGSQuikr: Fast Whole-Genome Shotgun Metagenomic Classification

David Koslicki; Simon Foucart; Gail Rosen

With the decrease in cost and increase in output of whole-genome shotgun technologies, many metagenomic studies are utilizing this approach in lieu of the more traditional 16S rRNA amplicon technique. Due to the large number of relatively short reads output from whole-genome shotgun technologies, there is a need for fast and accurate short-read OTU classifiers. While there are relatively fast and accurate algorithms available, such as MetaPhlAn, MetaPhyler, PhyloPythiaS, and PhymmBL, these algorithms still classify samples in a read-by-read fashion and so execution times can range from hours to days on large datasets. We introduce WGSQuikr, a reconstruction method which can compute a vector of taxonomic assignments and their proportions in the sample with remarkable speed and accuracy. We demonstrate on simulated data that WGSQuikr is typically more accurate and up to an order of magnitude faster than the aforementioned classification algorithms. We also verify the utility of WGSQuikr on real biological data in the form of a mock community. WGSQuikr is a Whole-Genome Shotgun QUadratic, Iterative, -mer based Reconstruction method which extends the previously introduced 16S rRNA-based algorithm Quikr. A MATLAB implementation of WGSQuikr is available at: http://sourceforge.net/projects/wgsquikr.


Genome Biology and Evolution | 2015

Prokaryotic nucleotide composition is shaped by both phylogeny and the environment

Erin Reichenberger; Gail Rosen; Uri Hershberg; Ruth Hershberg

The causes of the great variation in nucleotide composition of prokaryotic genomes have long been disputed. Here, we use extensive metagenomic and whole-genome data to demonstrate that both phylogeny and the environment shape prokaryotic nucleotide content. We show that across environments, various phyla are characterized by different mean guanine and cytosine (GC) values as well as by the extent of variation on that mean value. At the same time, we show that GC-content varies greatly as a function of environment, in a manner that cannot be entirely explained by disparities in phylogenetic composition. We find environmentally driven differences in nucleotide content not only between highly diverged environments (e.g., soil, vs. aquatic vs. human gut) but also within a single type of environment. More specifically, we demonstrate that some human guts are associated with a microbiome that is consistently more GC-rich across phyla, whereas others are associated with a more AT-rich microbiome. These differences appear to be driven both by variations in phylogenetic composition and by environmental differences—which are independent of these phylogenetic composition differences. Combined, our results demonstrate that both phylogeny and the environment significantly affect nucleotide composition and that the environmental differences affecting nucleotide composition are far subtler than previously appreciated.

Collaboration


Dive into the Gail Rosen's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge