Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Andrew S. Warren is active.

Publication


Featured researches published by Andrew S. Warren.


Infection and Immunity | 2011

PATRIC: the Comprehensive Bacterial Bioinformatics Resource with a Focus on Human Pathogenic Species

Joseph J. Gillespie; Alice R. Wattam; Stephen A. Cammer; Joseph L. Gabbard; Maulik Shukla; Oral Dalay; Timothy Driscoll; Deborah Hix; Shrinivasrao P. Mane; Chunhong Mao; Eric K. Nordberg; Mark Scott; Julie Schulman; Eric E. Snyder; Daniel E. Sullivan; Chunxia Wang; Andrew S. Warren; Kelly P. Williams; Tian Xue; Hyun Seung Yoo; Chengdong Zhang; Yan Zhang; Rebecca Will; Ronald W. Kenyon; Bruno W. S. Sobral

ABSTRACT Funded by the National Institute of Allergy and Infectious Diseases, the Pathosystems Resource Integration Center (PATRIC) is a genomics-centric relational database and bioinformatics resource designed to assist scientists in infectious-disease research. Specifically, PATRIC provides scientists with (i) a comprehensive bacterial genomics database, (ii) a plethora of associated data relevant to genomic analysis, and (iii) an extensive suite of computational tools and platforms for bioinformatics analysis. While the primary aim of PATRIC is to advance the knowledge underlying the biology of human pathogens, all publicly available genome-scale data for bacteria are compiled and continually updated, thereby enabling comparative analyses to reveal the basis for differences between infectious free-living and commensal species. Herein we summarize the major features available at PATRIC, dividing the resources into two major categories: (i) organisms, genomes, and comparative genomics and (ii) recurrent integration of community-derived associated data. Additionally, we present two experimental designs typical of bacterial genomics research and report on the execution of both projects using only PATRIC data and tools. These applications encompass a broad range of the data and analysis tools available, illustrating practical uses of PATRIC for the biologist. Finally, a summary of PATRICs outreach activities, collaborative endeavors, and future research directions is provided.


Molecular Plant-microbe Interactions | 2009

A Draft Genome Sequence of Pseudomonas syringae pv. tomato T1 Reveals a Type III Effector Repertoire Significantly Divergent from That of Pseudomonas syringae pv. tomato DC3000

Nalvo F. Almeida; Shuangchun Yan; Magdalen Lindeberg; David J. Studholme; David J. Schneider; Bradford Condon; Haijie Liu; Carlos Juliano M. Viana; Andrew S. Warren; Clive Evans; Eric Kemen; Daniel MacLean; Aurelie Angot; Gregory B. Martin; Jonathan D. G. Jones; Alan Collmer; João C. Setubal; Boris A. Vinatzer

Diverse gene products including phytotoxins, pathogen-associated molecular patterns, and type III secreted effectors influence interactions between Pseudomonas syringae strains and plants, with additional yet uncharacterized factors likely contributing as well. Of particular interest are those interactions governing pathogen-host specificity. Comparative genomics of closely related pathogens with different host specificity represents an excellent approach for identification of genes contributing to host-range determination. A draft genome sequence of Pseudomonas syringae pv. tomato T1, which is pathogenic on tomato but nonpathogenic on Arabidopsis thaliana, was obtained for this purpose and compared with the genome of the closely related A. thaliana and tomato model pathogen P. syringae pv. tomato DC3000. Although the overall genetic content of each of the two genomes appears to be highly similar, the repertoire of effectors was found to diverge significantly. Several P. syringae pv. tomato T1 effectors absent from strain DC3000 were confirmed to be translocated into plants, with the well-studied effector AvrRpt2 representing a likely candidate for host-range determination. However, the presence of avrRpt2 was not found sufficient to explain A. thaliana resistance to P. syringae pv. tomato T1, suggesting that other effectors and possibly type III secretion system-independent factors also play a role in this interaction.


Nucleic Acids Research | 2017

Improvements to PATRIC, the all-bacterial Bioinformatics Database and Analysis Resource Center

Alice R. Wattam; James J. Davis; Rida Assaf; Sébastien Boisvert; Thomas Brettin; Christopher Bun; Neal Conrad; Emily M. Dietrich; Terry Disz; Joseph L. Gabbard; Svetlana Gerdes; Christopher S. Henry; Ronald Kenyon; Dustin Machi; Chunhong Mao; Eric K. Nordberg; Gary J. Olsen; Daniel Murphy-Olson; Robert Olson; Ross Overbeek; Bruce Parrello; Gordon D. Pusch; Maulik Shukla; Veronika Vonstein; Andrew S. Warren; Fangfang Xia; Hyun Seung Yoo; Rick Stevens

The Pathosystems Resource Integration Center (PATRIC) is the bacterial Bioinformatics Resource Center (https://www.patricbrc.org). Recent changes to PATRIC include a redesign of the web interface and some new services that provide users with a platform that takes them from raw reads to an integrated analysis experience. The redesigned interface allows researchers direct access to tools and data, and the emphasis has changed to user-created genome-groups, with detailed summaries and views of the data that researchers have selected. Perhaps the biggest change has been the enhanced capability for researchers to analyze their private data and compare it to the available public data. Researchers can assemble their raw sequence reads and annotate the contigs using RASTtk. PATRIC also provides services for RNA-Seq, variation, model reconstruction and differential expression analysis, all delivered through an updated private workspace. Private data can be compared by ‘virtual integration’ to any of PATRICs public data. The number of genomes available for comparison in PATRIC has expanded to over 80 000, with a special emphasis on genomes with antimicrobial resistance data. PATRIC uses this data to improve both subsystem annotation and k-mer classification, and tags new genomes as having signatures that indicate susceptibility or resistance to specific antibiotics.


BMC Bioinformatics | 2010

Missing genes in the annotation of prokaryotic genomes

Andrew S. Warren; Jeremy S. Archuleta; Wu-chun Feng; João C. Setubal

BackgroundProtein-coding gene detection in prokaryotic genomes is considered a much simpler problem than in intron-containing eukaryotic genomes. However there have been reports that prokaryotic gene finder programs have problems with small genes (either over-predicting or under-predicting). Therefore the question arises as to whether current genome annotations have systematically missing, small genes.ResultsWe have developed a high-performance computing methodology to investigate this problem. In this methodology we compare all ORFs larger than or equal to 33 aa from all fully-sequenced prokaryotic replicons. Based on that comparison, and using conservative criteria requiring a minimum taxonomic diversity between conserved ORFs in different genomes, we have discovered 1,153 candidate genes that are missing from current genome annotations. These missing genes are similar only to each other and do not have any strong similarity to gene sequences in public databases, with the implication that these ORFs belong to missing gene families. We also uncovered 38,895 intergenic ORFs, readily identified as putative genes by similarity to currently annotated genes (we call these absent annotations). The vast majority of the missing genes found are small (less than 100 aa). A comparison of select examples with GeneMark, EasyGene and Glimmer predictions yields evidence that some of these genes are escaping detection by these programs.ConclusionsProkaryotic gene finders and prokaryotic genome annotations require improvement for accurate prediction of small genes. The number of missing gene families found is likely a lower bound on the actual number, due to the conservative criteria used to determine whether an ORF corresponds to a real gene.


BMC Evolutionary Biology | 2010

Functional bias in molecular evolution rate of Arabidopsis thaliana.

Andrew S. Warren; Ramu Anandakrishnan; Liqing Zhang

BackgroundCharacteristics derived from mutation and other mechanisms that are advantageous for survival are often preserved during evolution by natural selection. Some genes are conserved in many organisms because they are responsible for fundamental biological function, others are conserved for their unique functional characteristics. Therefore one would expect the rate of molecular evolution for individual genes to be dependent on their biological function. Whether this expectation holds for genes duplicated by whole genome duplication is not known.ResultsWe empirically demonstrate here, using duplicated genes generated from the Arabidopsis thaliana α-duplication event, that the rate of molecular evolution of genes duplicated in this event depend on biological function. Using functional clustering based on gene ontology annotation of gene pairs, we show that some duplicated genes, such as defense response genes, are under weaker purifying selection or under stronger diversifying selection than other duplicated genes, such as protein translation genes, as measured by the ratio of nonsynonymous to synonymous divergence (dN/dS).ConclusionsThese results provide empirical evidence indicating that molecular evolution rate for genes duplicated in whole genome duplication, as measured by dN/dS, may depend on biological function, which we characterize using gene ontology annotation. Furthermore, the general approach used here provides a framework for comparative analysis of molecular evolution rate for genes based on their biological function.


BMC Bioinformatics | 2009

The Genome Reverse Compiler: an explorative annotation tool

Andrew S. Warren; João C. Setubal

BackgroundAs sequencing costs have decreased, whole genome sequencing has become a viable and integral part of biological laboratory research. However, the tools with which genes can be found and functionally characterized have not been readily adapted to be part of the everyday biological sciences toolkit. Most annotation pipelines remain as a service provided by large institutions or come as an unwieldy conglomerate of independent components, each requiring their own setup and maintenance.ResultsTo address this issue we have created the Genome Reverse Compiler, an easy-to-use, open-source, automated annotation tool. The GRC is independent of third party software installs and only requires a Linux operating system. This stands in contrast to most annotation packages, which typically require installation of relational databases, sequence similarity software, and a number of other programming language modules. We provide details on the methodology used by GRC and evaluate its performance on several groups of prokaryotes using GRCs built in comparison module.ConclusionTraditionally, to perform whole genome annotation a user would either set up a pipeline or take advantage of an online service. With GRC the user need only provide the genome he or she wants to annotate and the function resource files to use. The result is high usability and a very minimal learning curve for the intended audience of life science researchers and bioinformaticians. We believe that the GRC fills a valuable niche in allowing users to perform explorative, whole-genome annotation.


Bioinformatics | 2015

RNA-Rocket: an RNA-Seq analysis resource for infectious disease research

Andrew S. Warren; Cristina Aurrecoechea; Brian P. Brunk; Prerak T. Desai; Scott J. Emrich; Gloria I. Giraldo-Calderón; Omar S. Harb; Deborah Hix; Daniel Lawson; Dustin Machi; Chunhong Mao; Michael McClelland; Eric K. Nordberg; Maulik Shukla; Leslie B. Vosshall; Alice R. Wattam; Rebecca Will; Hyun Seung Yoo; Bruno W. S. Sobral

Motivation: RNA-Seq is a method for profiling transcription using high-throughput sequencing and is an important component of many research projects that wish to study transcript isoforms, condition specific expression and transcriptional structure. The methods, tools and technologies used to perform RNA-Seq analysis continue to change, creating a bioinformatics challenge for researchers who wish to exploit these data. Resources that bring together genomic data, analysis tools, educational material and computational infrastructure can minimize the overhead required of life science researchers. Results: RNA-Rocket is a free service that provides access to RNA-Seq and ChIP-Seq analysis tools for studying infectious diseases. The site makes available thousands of pre-indexed genomes, their annotations and the ability to stream results to the bioinformatics resources VectorBase, EuPathDB and PATRIC. The site also provides a combination of experimental data and metadata, examples of pre-computed analysis, step-by-step guides and a user interface designed to enable both novice and experienced users of RNA-Seq data. Availability and implementation: RNA-Rocket is available at rnaseq.pathogenportal.org. Source code for this project can be found at github.com/cidvbi/PathogenPortal. Contact: [email protected] Supplementary information: Supplementary materials are available at Bioinformatics online.


Briefings in Bioinformatics | 2017

PATRIC as a unique resource for studying antimicrobial resistance

Dionysios A. Antonopoulos; Rida Assaf; Ramy K. Aziz; Thomas Brettin; Christopher Bun; Neal Conrad; James J. Davis; Emily M. Dietrich; Terry Disz; Svetlana Gerdes; Ronald W. Kenyon; Dustin Machi; Chunhong Mao; Daniel Murphy-Olson; Eric K. Nordberg; Gary J. Olsen; Robert J. Olson; Ross Overbeek; Bruce Parrello; Gordon D. Pusch; John Santerre; Maulik Shukla; Rick Stevens; Margo VanOeffelen; Veronika Vonstein; Andrew S. Warren; Alice R. Wattam; Fangfang Xia; Hyunseung Yoo

Abstract The Pathosystems Resource Integration Center (PATRIC, www.patricbrc.org) is designed to provide researchers with the tools and services that they need to perform genomic and other ‘omic’ data analyses. In response to mounting concern over antimicrobial resistance (AMR), the PATRIC team has been developing new tools that help researchers understand AMR and its genetic determinants. To support comparative analyses, we have added AMR phenotype data to over 15 000 genomes in the PATRIC database, often assembling genomes from reads in public archives and collecting their associated AMR panel data from the literature to augment the collection. We have also been using this collection of AMR metadata to build machine learning-based classifiers that can predict the AMR phenotypes and the genomic regions associated with resistance for genomes being submitted to the annotation service. Likewise, we have undertaken a large AMR protein annotation effort by manually curating data from the literature and public repositories. This collection of 7370 AMR reference proteins, which contains many protein annotations (functional roles) that are unique to PATRIC and RAST, has been manually curated so that it projects stably across genomes. The collection currently projects to 1 610 744 proteins in the PATRIC database. Finally, the PATRIC Web site has been expanded to enable AMR-based custom page views so that researchers can easily explore AMR data and design experiments based on whole genomes or individual genes.


Scientific Reports | 2017

Whole Exome Sequencing to Identify Genetic Variants Associated with Raised Atherosclerotic Lesions in Young Persons

James E. Hixson; Goo Jun; Lawrence C. Shimmin; Yizhi Wang; Guoqiang Yu; Chunhong Mao; Andrew S. Warren; Timothy D. Howard; Richard S. Vander Heide; Jennifer E. Van Eyk; Yue Wang; David M. Herrington

We investigated the influence of genetic variants on atherosclerosis using whole exome sequencing in cases and controls from the autopsy study “Pathobiological Determinants of Atherosclerosis in Youth (PDAY)”. We identified a PDAY case group with the highest total amounts of raised lesions (n = 359) for comparisons with a control group with no detectable raised lesions (n = 626). In addition to the standard exome capture, we included genome-wide proximal promoter regions that contain sequences that regulate gene expression. Our statistical analyses included single variant analysis for common variants (MAF > 0.01) and rare variant analysis for low frequency and rare variants (MAF < 0.05). In addition, we investigated known CAD genes previously identified by meta-analysis of GWAS studies. We did not identify individual common variants that reached exome-wide significance using single variant analysis. In analysis limited to 60 CAD genes, we detected strong associations with COL4A2/COL4A1 that also previously showed associations with myocardial infarction and arterial stiffness, as well as coronary artery calcification. Likewise, rare variant analysis did not identify genes that reached exome-wide significance. Among the 60 CAD genes, the strongest association was with NBEAL1 that was also identified in gene-based analysis of whole exome sequencing for early onset myocardial infarction.


bioRxiv | 2017

Panaconda: Application of pan-synteny graph models to genome content analysis

Andrew S. Warren; James J. Davis; Alice R. Wattam; Dustin Machi; João C. Setubal; Lenwood S. Heath

Motivation Whole-genome alignment and pan-genome analysis are useful tools in understanding the similarities and differences of many genomes in an evolutionary context. Here we introduce the concept of pan-synteny graphs, an analysis method that combines elements of both to represent conservation and change of multiple prokaryotic genomes at an architectural level. Pan-synteny graphs represent a reference free approach for the comparison of many genomes and allows for the identification of synteny, insertion, deletion, replacement, inversion, recombination, missed assembly joins, evolutionary hotspots, and reference based scaffolding. Results We present an algorithm for creating whole genome multiple sequence comparisons and a model for representing the similarities and differences among sequences as a graph of syntenic gene families. As part of the pan-synteny graph creation, we first create a de Bruijn graph. Instead of the alphabet of nucleotides commonly used in genome assembly, we use an alphabet of gene families. This de Bruijn graph is then processed to create the pan-synteny graph. Our approach is novel in that it explicitly controls how regions from the same sequence and genome are aligned and generates a graph in which all sequences are fully represented as paths. This method harnesses previous computation involved in protein family calculation to speed up the creation of whole genome alignment for many genomes. We provide the software suite Panaconda, for the calculation of pan-synteny graphs given annotation input, and an implementation of methods for their layout and visualization. Availability Panaconda is available at https://github.com/aswarren/pangenome_graphs and datasets used in examples are available at https://github.com/aswarren/pangenome_examples Contact Andrew Warren [email protected]

Collaboration


Dive into the Andrew S. Warren's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Chunhong Mao

Virginia Bioinformatics Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Maulik Shukla

Virginia Bioinformatics Institute

View shared research outputs
Top Co-Authors

Avatar

Dustin Machi

Virginia Bioinformatics Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge