Jared Wilkening | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jared Wilkening is active.

Explore More

Publication

Featured researches published by Jared Wilkening.

BMC Bioinformatics | 2008

The metagenomics RAST server – a public resource for the automatic phylogenetic and functional analysis of metagenomes

Folker Meyer; Daniel Paarmann; Mark D'Souza; Robert Olson; Elizabeth M. Glass; Michael Kubal; Tobias Paczian; Alexis Rodriguez; Rick Stevens; Andreas Wilke; Jared Wilkening; Robert Edwards

AbstractBackgroundRandom community genomes (metagenomes) are now commonly used to study microbes in different environments. Over the past few years, the major challenge associated with metagenomics shifted from generating to analyzing sequences. High-throughput, low-cost next-generation sequencing has provided access to metagenomics to a wide range of researchers.ResultsA high-throughput pipeline has been constructed to provide high-performance computing to all researchers interested in using metagenomics. The pipeline produces automated functional assignments of sequences in the metagenome by comparing both protein and nucleotide databases. Phylogenetic and functional summaries of the metagenomes are generated, and tools for comparative metagenomics are incorporated into the standard views. User access is controlled to ensure data privacy, but the collaborative environment underpinning the service provides a framework for sharing datasets between multiple users. In the metagenomics RAST, all users retain full control of their data, and everything is available for download in a variety of formats.ConclusionThe open-source metagenomics RAST service provides a new paradigm for the annotation and analysis of metagenomes. With built-in support for multiple data sources and a back end that houses abstract data types, the metagenomics RAST is stable, extensible, and freely available to all researchers. This service has removed one of the primary bottlenecks in metagenome sequence analysis – the availability of high-performance computing for annotating the data. http://metagenomics.nmpdr.org

CSH Protocols | 2010

Using the Metagenomics RAST Server (MG-RAST) for Analyzing Shotgun Metagenomes

Elizabeth M. Glass; Jared Wilkening; Andreas Wilke; Dionysios A. Antonopoulos; Folker Meyer

Shotgun metagenomics creates millions of fragments of short DNA reads, which are meaningless unless analyzed appropriately. The Metagenomics RAST server (MG-RAST) is a web-based, open source system that offers a unique suite of tools for analyzing these data sets. After de-replication and quality control, fragments are mapped against a comprehensive nonredundant database (NR). Phylogenetic and metabolic reconstructions are computed from the set of hits against the NR. The resulting data are made available for browsing, download, and most importantly, comparison against a comprehensive collection of public metagenomes. A submitted metagenome is visible only to the user, unless the user makes it public or shares with other registered users. Public metagenomes are available to all.

BMC Bioinformatics | 2012

The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools.

Andreas Wilke; Travis Harrison; Jared Wilkening; Dawn Field; Elizabeth M. Glass; Nikos C. Kyrpides; Konstantinos Mavrommatis; Folker Meyer

BackgroundComputing of sequence similarity results is becoming a limiting factor in metagenome analysis. Sequence similarity search results encoded in an open, exchangeable format have the potential to limit the needs for computational reanalysis of these data sets. A prerequisite for sharing of similarity results is a common reference.DescriptionWe introduce a mechanism for automatically maintaining a comprehensive, non-redundant protein database and for creating a quarterly release of this resource. In addition, we present tools for translating similarity searches into many annotation namespaces, e.g. KEGG or NCBIs GenBank.ConclusionsThe data and tools we present allow the creation of multiple result sets using a single computation, permitting computational results to be shared between groups for large sequence data sets.

international conference on cluster computing | 2009

Using clouds for metagenomics: A case study

Jared Wilkening; Andreas Wilke; Narayan Desai; Folker Meyer

Cutting-edge sequencing systems produce data at a prodigious rate; and the analysis of these datasets requires significant computing resources. Cloud computing provides a tantalizing possibility for on-demand access to computing resources. However, many open questions remain. We present here a performance assessment of BLAST on real metagenomics data in a cloud setting in order to determine the viability of this approach. BLAST is one of the premier applications in bioinformatics and computational biology and is assumed to consume the vast majority of resources in that area.

PLOS Computational Biology | 2012

A Platform-Independent Method for Detecting Errors in Metagenomic Sequencing Data: DRISEE

Kevin P. Keegan; William L. Trimble; Jared Wilkening; Andreas Wilke; Travis Harrison; Mark D'Souza; Folker Meyer

We provide a novel method, DRISEE (duplicate read inferred sequencing error estimation), to assess sequencing quality (alternatively referred to as “noise” or “error”) within and/or between sequencing samples. DRISEE provides positional error estimates that can be used to inform read trimming within a sample. It also provides global (whole sample) error estimates that can be used to identify samples with high or varying levels of sequencing error that may confound downstream analyses, particularly in the case of studies that utilize data from multiple sequencing samples. For shotgun metagenomic data, we believe that DRISEE provides estimates of sequencing error that are more accurate and less constrained by technical limitations than existing methods that rely on reference genomes or the use of scores (e.g. Phred). Here, DRISEE is applied to (non amplicon) data sets from both the 454 and Illumina platforms. The DRISEE error estimate is obtained by analyzing sets of artifactual duplicate reads (ADRs), a known by-product of both sequencing platforms. We present DRISEE as an open-source, platform-independent method to assess sequencing error in shotgun metagenomic data, and utilize it to discover previously uncharacterized error in de novo sequence data from the 454 and Illumina sequencing platforms.

PLOS Computational Biology | 2015

A RESTful API for accessing microbial community data for MG-RAST

Andreas Wilke; Jared Bischof; Travis Harrison; Tom Brettin; Mark D'Souza; Wolfgang Gerlach; Hunter Matthews; Tobias Paczian; Jared Wilkening; Elizabeth M. Glass; Narayan Desai; Folker Meyer

Metagenomic sequencing has produced significant amounts of data in recent years. For example, as of summer 2013, MG-RAST has been used to annotate over 110,000 data sets totaling over 43 Terabases. With metagenomic sequencing finding even wider adoption in the scientific community, the existing web-based analysis tools and infrastructure in MG-RAST provide limited capability for data retrieval and analysis, such as comparative analysis between multiple data sets. Moreover, although the system provides many analysis tools, it is not comprehensive. By opening MG-RAST up via a web services API (application programmers interface) we have greatly expanded access to MG-RAST data, as well as provided a mechanism for the use of third-party analysis tools with MG-RAST data. This RESTful API makes all data and data objects created by the MG-RAST pipeline accessible as JSON objects. As part of the DOE Systems Biology Knowledgebase project (KBase, http://kbase.us) we have implemented a web services API for MG-RAST. This API complements the existing MG-RAST web interface and constitutes the basis of KBases microbial community capabilities. In addition, the API exposes a comprehensive collection of data to programmers. This API, which uses a RESTful (Representational State Transfer) implementation, is compatible with most programming environments and should be easy to use for end users and third parties. It provides comprehensive access to sequence data, quality control results, annotations, and many other data types. Where feasible, we have used standards to expose data and metadata. Code examples are provided in a number of languages both to show the versatility of the API and to provide a starting point for users. We present an API that exposes the data in MG-RAST for consumption by our users, greatly enhancing the utility of the MG-RAST service.

BMC Bioinformatics | 2012

Short-read reading-frame predictors are not created equal: sequence error causes loss of signal

William L. Trimble; Kevin P. Keegan; Mark D’Souza; Andreas Wilke; Jared Wilkening; Jack A. Gilbert; Folker Meyer

BackgroundGene prediction algorithms (or gene callers) are an essential tool for analyzing shotgun nucleic acid sequence data. Gene prediction is a ubiquitous step in sequence analysis pipelines; it reduces the volume of data by identifying the most likely reading frame for a fragment, permitting the out-of-frame translations to be ignored. In this study we evaluate five widely used ab initio gene-calling algorithms—FragGeneScan, MetaGeneAnnotator, MetaGeneMark, Orphelia, and Prodigal—for accuracy on short (75–1000 bp) fragments containing sequence error from previously published artificial data and “real” metagenomic datasets.ResultsWhile gene prediction tools have similar accuracies predicting genes on error-free fragments, in the presence of sequencing errors considerable differences between tools become evident. For error-containing short reads, FragGeneScan finds more prokaryotic coding regions than does MetaGeneAnnotator, MetaGeneMark, Orphelia, or Prodigal. This improved detection of genes in error-containing fragments, however, comes at the cost of much lower (50%) specificity and overprediction of genes in noncoding regions.ConclusionsAb initio gene callers offer a significant reduction in the computational burden of annotating individual nucleic acid reads and are used in many metagenomic annotation systems. For predicting reading frames on raw reads, we find the hidden Markov model approach in FragGeneScan is more sensitive than other gene prediction tools, while Prodigal, MGA, and MGM are better suited for higher-quality sequences such as assembled contigs.

Biochimica et Biophysica Acta | 2011

Connecting genotype to phenotype in the era of high-throughput sequencing.

Christopher S. Henry; Ross Overbeek; Fangfang Xia; Aaron A. Best; Elizabeth M. Glass; Jack A. Gilbert; Peter E. Larsen; Robert Edwards; Terry Disz; Folker Meyer; Veronika Vonstein; Matthew DeJongh; Daniela Bartels; Narayan Desai; Mark D'Souza; Scott Devoid; Kevin P. Keegan; Robert Olson; Andreas Wilke; Jared Wilkening; Rick Stevens

BACKGROUND The development of next generation sequencing technology is rapidly changing the face of the genome annotation and analysis field. One of the primary uses for genome sequence data is to improve our understanding and prediction of phenotypes for microbes and microbial communities, but the technologies for predicting phenotypes must keep pace with the new sequences emerging. SCOPE OF REVIEW This review presents an integrated view of the methods and technologies used in the inference of phenotypes for microbes and microbial communities based on genomic and metagenomic data. Given the breadth of this topic, we place special focus on the resources available within the SEED Project. We discuss the two steps involved in connecting genotype to phenotype: sequence annotation, and phenotype inference, and we highlight the challenges in each of these steps when dealing with both single genome and metagenome data. MAJOR CONCLUSIONS This integrated view of the genotype-to-phenotype problem highlights the importance of a controlled ontology in the annotation of genomic data, as this benefits subsequent phenotype inference and metagenome annotation. We also note the importance of expanding the set of reference genomes to improve the annotation of all sequence data, and we highlight metagenome assembly as a potential new source for complete genomes. Finally, we find that phenotype inference, particularly from metabolic models, generates predictions that can be validated and reconciled to improve annotations. GENERAL SIGNIFICANCE This review presents the first look at the challenges and opportunities associated with the inference of phenotype from genotype during the next generation sequencing revolution. This article is part of a Special Issue entitled: Systems Biology of Microorganisms.

international conference on big data | 2013

A scalable data analysis platform for metagenomics

Wei Tang; Jared Wilkening; Narayan Desai; Wolfgang Gerlach; Andreas Wilke; Folker Meyer

With the advent of high-throughput DNA sequencing technology, the analysis and management of the increasing amount of biological sequence data has become a bottleneck for scientific progress. For example, MG-RAST, a metagenome annotation system serving a large scientific community worldwide, has experienced a sustained, exponential growth in data submissions for several years; and this trend is expected to continue. To address the computational challenges posed by this workload, we developed a new data analysis platform, including a data management system (Shock) for biological sequence data and a workflow management system (AWE) supporting scalable, fault-tolerant task and resource management. Shock and AWE can be used to build a scalable and reproducible data analysis infrastructure for upper-level biological data analysis services.

Methods in Enzymology | 2013

A Metagenomics Portal for a Democratized Sequencing World

Andreas Wilke; Elizabeth M. Glass; Daniela Bartels; Jared Bischof; Daniel Braithwaite; Mark D’Souza; Wolfgang Gerlach; Travis Harrison; Kevin P. Keegan; Hunter Matthews; Renzo Kottmann; Tobias Paczian; Wei Tang; William L. Trimble; Pelin Yilmaz; Jared Wilkening; Narayan Desai; Folker Meyer

The democratized world of sequencing is leading to numerous data analysis challenges; MG-RAST addresses many of these challenges for diverse datasets, including amplicon datasets, shotgun metagenomes, and metatranscriptomes. The changes from version 2 to version 3 include the addition of a dedicated gene calling stage using FragGenescan, clustering of predicted proteins at 90% identity, and the use of BLAT for the computation of similarities. Together with changes in the underlying software infrastructure, this has enabled the dramatic scaling up of pipeline throughput while remaining on a limited hardware budget. The Web-based service allows upload, fully automated analysis, and visualization of results. As a result of the plummeting cost of sequencing and the readily available analytical power of MG-RAST, over 78,000 metagenomic datasets have been analyzed, with over 12,000 of them publicly available in MG-RAST.

Explore More