Tobias Paczian
Argonne National Laboratory
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tobias Paczian.
BMC Genomics | 2008
Ramy K. Aziz; Daniela Bartels; Aaron A. Best; Matthew DeJongh; Terrence Disz; Robert Edwards; Kevin Formsma; Svetlana Gerdes; Elizabeth M. Glass; Michael Kubal; Folker Meyer; Gary J. Olsen; Robert Olson; Andrei L. Osterman; Ross Overbeek; Leslie K. McNeil; Daniel Paarmann; Tobias Paczian; Bruce Parrello; Gordon D. Pusch; Claudia I. Reich; Rick Stevens; Olga Vassieva; Veronika Vonstein; Andreas Wilke; Olga Zagnitko
BackgroundThe number of prokaryotic genome sequences becoming available is growing steadily and is growing faster than our ability to accurately annotate them.DescriptionWe describe a fully automated service for annotating bacterial and archaeal genomes. The service identifies protein-encoding, rRNA and tRNA genes, assigns functions to the genes, predicts which subsystems are represented in the genome, uses this information to reconstruct the metabolic network and makes the output easily downloadable for the user. In addition, the annotated genome can be browsed in an environment that supports comparative analysis with the annotated genomes maintained in the SEED environment.The service normally makes the annotated genome available within 12–24 hours of submission, but ultimately the quality of such a service will be judged in terms of accuracy, consistency, and completeness of the produced annotations. We summarize our attempts to address these issues and discuss plans for incrementally enhancing the service.ConclusionBy providing accurate, rapid annotation freely to the community we have created an important community resource. The service has now been utilized by over 120 external users annotating over 350 distinct genomes.
BMC Bioinformatics | 2008
Folker Meyer; Daniel Paarmann; Mark D'Souza; Robert Olson; Elizabeth M. Glass; Michael Kubal; Tobias Paczian; Alexis Rodriguez; Rick Stevens; Andreas Wilke; Jared Wilkening; Robert Edwards
AbstractBackgroundRandom community genomes (metagenomes) are now commonly used to study microbes in different environments. Over the past few years, the major challenge associated with metagenomics shifted from generating to analyzing sequences. High-throughput, low-cost next-generation sequencing has provided access to metagenomics to a wide range of researchers.ResultsA high-throughput pipeline has been constructed to provide high-performance computing to all researchers interested in using metagenomics. The pipeline produces automated functional assignments of sequences in the metagenome by comparing both protein and nucleotide databases. Phylogenetic and functional summaries of the metagenomes are generated, and tools for comparative metagenomics are incorporated into the standard views. User access is controlled to ensure data privacy, but the collaborative environment underpinning the service provides a framework for sharing datasets between multiple users. In the metagenomics RAST, all users retain full control of their data, and everything is available for download in a variety of formats.ConclusionThe open-source metagenomics RAST service provides a new paradigm for the annotation and analysis of metagenomes. With built-in support for multiple data sources and a back end that houses abstract data types, the metagenomics RAST is stable, extensible, and freely available to all researchers. This service has removed one of the primary bottlenecks in metagenome sequence analysis – the availability of high-performance computing for annotating the data. http://metagenomics.nmpdr.org
Nucleic Acids Research | 2007
Leslie K. McNeil; Claudia I. Reich; Ramy K. Aziz; Daniela Bartels; Matthew Cohoon; Terry Disz; Robert Edwards; Svetlana Gerdes; Kaitlyn Hwang; Michael Kubal; Gohar Rem Margaryan; Folker Meyer; William Mihalo; Gary J. Olsen; Robert Olson; Andrei L. Osterman; Daniel Paarmann; Tobias Paczian; Bruce Parrello; Gordon D. Pusch; Dmitry A. Rodionov; Xinghua Shi; Olga Vassieva; Veronika Vonstein; Olga Zagnitko; Fangfang Xia; Jenifer Zinner; Ross Overbeek; Rick Stevens
The National Microbial Pathogen Data Resource (NMPDR) () is a National Institute of Allergy and Infections Disease (NIAID)-funded Bioinformatics Resource Center that supports research in selected Category B pathogens. NMPDR contains the complete genomes of ∼50 strains of pathogenic bacteria that are the focus of our curators, as well as >400 other genomes that provide a broad context for comparative analysis across the three phylogenetic Domains. NMPDR integrates complete, public genomes with expertly curated biological subsystems to provide the most consistent genome annotations. Subsystems are sets of functional roles related by a biologically meaningful organizing principle, which are built over large collections of genomes; they provide researchers with consistent functional assignments in a biologically structured context. Investigators can browse subsystems and reactions to develop accurate reconstructions of the metabolic networks of any sequenced organism. NMPDR provides a comprehensive bioinformatics platform, with tools and viewers for genome analysis. Results of precomputed gene clustering analyses can be retrieved in tabular or graphic format with one-click tools. NMPDR tools include Signature Genes, which finds the set of genes in common or that differentiates two groups of organisms. Essentiality data collated from genome-wide studies have been curated. Drug target identification and high-throughput, in silico, compound screening are in development.
Nucleic Acids Research | 2016
Andreas Wilke; Jared Bischof; Wolfgang Gerlach; Elizabeth M. Glass; Travis Harrison; Kevin P. Keegan; Tobias Paczian; William L. Trimble; Saurabh Bagchi; Somali Chaterji; Folker Meyer
MG-RAST (http://metagenomics.anl.gov) is an open-submission data portal for processing, analyzing, sharing and disseminating metagenomic datasets. The system currently hosts over 200 000 datasets and is continuously updated. The volume of submissions has increased 4-fold over the past 24 months, now averaging 4 terabasepairs per month. In addition to several new features, we report changes to the analysis workflow and the technologies used to scale the pipeline up to the required throughput levels. To show possible uses for the data from MG-RAST, we present several examples integrating data and analyses from MG-RAST into popular third-party analysis tools or sequence alignment tools.
Environmental Microbiology | 2014
Kim M. Handley; Daniela Bartels; Edward J. O'Loughlin; Kenneth H. Williams; William L. Trimble; Kelly Skinner; Jack A. Gilbert; Narayan Desai; Elizabeth M. Glass; Tobias Paczian; Andreas Wilke; Dionysios A. Antonopoulos; Kenneth M. Kemner; Folker Meyer
We reconstructed the complete 2.4 Mb-long genome of a previously uncultivated epsilonproteobacterium, Candidatus Sulfuricurvum sp. RIFRC-1, via assembly of short-read shotgun metagenomic data using a complexity reduction approach. Genome-based comparisons indicate the bacterium is a novel species within the Sulfuricurvum genus, which contains one cultivated representative, S. kujiense. Divergence between the species appears due in part to extensive genomic rearrangements, gene loss and chromosomal versus plasmid encoding of certain (respiratory) genes by RIFRC-1. Deoxyribonucleic acid for the genome was obtained from terrestrial aquifer sediment, in which RIFRC-1 comprised ∼ 47% of the bacterial community. Genomic evidence suggests RIFRC-1 is a chemolithoautotrophic diazotroph capable of deriving energy for growth by microaerobic or nitrate-/nitric oxide-dependent oxidation of S°, sulfide or sulfite or H₂oxidation. Carbon may be fixed via the reductive tricarboxylic acid cycle. Consistent with these physiological attributes, the local aquifer was microoxic with small concentrations of available nitrate, small but elevated concentrations of reduced sulfur and NH(4)(+) /NH₃-limited. Additionally, various mechanisms for heavy metal and metalloid tolerance and virulence point to a lifestyle well-adapted for metal(loid)-rich environments and a shared evolutionary past with pathogenic Epsilonproteobacteria. Results expand upon recent findings highlighting the potential importance of sulfur and hydrogen metabolism in the terrestrial subsurface.
PLOS Computational Biology | 2015
Andreas Wilke; Jared Bischof; Travis Harrison; Tom Brettin; Mark D'Souza; Wolfgang Gerlach; Hunter Matthews; Tobias Paczian; Jared Wilkening; Elizabeth M. Glass; Narayan Desai; Folker Meyer
Metagenomic sequencing has produced significant amounts of data in recent years. For example, as of summer 2013, MG-RAST has been used to annotate over 110,000 data sets totaling over 43 Terabases. With metagenomic sequencing finding even wider adoption in the scientific community, the existing web-based analysis tools and infrastructure in MG-RAST provide limited capability for data retrieval and analysis, such as comparative analysis between multiple data sets. Moreover, although the system provides many analysis tools, it is not comprehensive. By opening MG-RAST up via a web services API (application programmers interface) we have greatly expanded access to MG-RAST data, as well as provided a mechanism for the use of third-party analysis tools with MG-RAST data. This RESTful API makes all data and data objects created by the MG-RAST pipeline accessible as JSON objects. As part of the DOE Systems Biology Knowledgebase project (KBase, http://kbase.us) we have implemented a web services API for MG-RAST. This API complements the existing MG-RAST web interface and constitutes the basis of KBases microbial community capabilities. In addition, the API exposes a comprehensive collection of data to programmers. This API, which uses a RESTful (Representational State Transfer) implementation, is compatible with most programming environments and should be easy to use for end users and third parties. It provides comprehensive access to sequence data, quality control results, annotations, and many other data types. Where feasible, we have used standards to expose data and metadata. Code examples are provided in a number of languages both to show the versatility of the API and to provide a starting point for users. We present an API that exposes the data in MG-RAST for consumption by our users, greatly enhancing the utility of the MG-RAST service.
Nucleic Acids Research | 2005
Alexander Goesmann; Burkhard Linke; Daniela Bartels; Michael Dondrup; Lutz Krause; Heiko Neuweger; Sebastian Oehm; Tobias Paczian; Andreas Wilke; Folker Meyer
The growing amount of information resulting from the increasing number of publicly available genomes and experimental results thereof necessitates the development of comprehensive systems for data processing and analysis. In this paper, we describe the current state and latest developments of our BRIGEP bioinformatics software system consisting of three web-based applications: GenDB, EMMA and ProDB. These applications facilitate the processing and analysis of bacterial genome, transcriptome and proteome data and are actively used by numerous international groups. We are currently in the process of extensively interconnecting these applications. BRIGEP was developed in the Bioinformatics Resource Facility of the Center for Biotechnology at Bielefeld University and is freely available. A demo project with sample data and access to all three tools is available at . Code bundles for these and other tools developed in our group are accessible on our FTP server at .
Methods in Enzymology | 2013
Andreas Wilke; Elizabeth M. Glass; Daniela Bartels; Jared Bischof; Daniel Braithwaite; Mark D’Souza; Wolfgang Gerlach; Travis Harrison; Kevin P. Keegan; Hunter Matthews; Renzo Kottmann; Tobias Paczian; Wei Tang; William L. Trimble; Pelin Yilmaz; Jared Wilkening; Narayan Desai; Folker Meyer
The democratized world of sequencing is leading to numerous data analysis challenges; MG-RAST addresses many of these challenges for diverse datasets, including amplicon datasets, shotgun metagenomes, and metatranscriptomes. The changes from version 2 to version 3 include the addition of a dedicated gene calling stage using FragGenescan, clustering of predicted proteins at 90% identity, and the use of BLAT for the computation of similarities. Together with changes in the underlying software infrastructure, this has enabled the dramatic scaling up of pipeline throughput while remaining on a limited hardware budget. The Web-based service allows upload, fully automated analysis, and visualization of results. As a result of the plummeting cost of sequencing and the readily available analytical power of MG-RAST, over 78,000 metagenomic datasets have been analyzed, with over 12,000 of them publicly available in MG-RAST.
International Journal of Intelligent Systems Technologies and Applications | 2007
Christian Thurau; Tobias Paczian; Gerhard Sagerer; Christian Bauckhage
Imitation learning is a powerful mechanism applied by primates and humans. It allows for a straightforward acquisition of behaviours that, through observation, are known to solve everyday tasks. Recently, a Bayesian formulation has been proposed that provides a mathematical model of imitation learning. In this paper, we apply this framework to the problem of programming believable computer games characters. We will present experiments in imitation learning from the network traffic of multi-player online games. Our results underline that this indeed produces agents that behave more human-like than characters controlled by common game AI techniques.
Standards in Genomic Sciences | 2014
Jared Bischof; Travis Harrison; Tobias Paczian; Elizabeth M. Glass; Andreas Wilke; Folker Meyer
BackgroundAs the impact and prevalence of large-scale metagenomic surveys grow, so does the acute need for more complete and standards compliant metadata. Metadata (data describing data) provides an essential complement to experimental data, helping to answer questions about its source, mode of collection, and reliability. Metadata collection and interpretation have become vital to the genomics and metagenomics communities, but considerable challenges remain, including exchange, curation, and distribution.Currently, tools are available for capturing basic field metadata during sampling, and for storing, updating and viewing it. Unfortunately, these tools are not specifically designed for metagenomic surveys; in particular, they lack the appropriate metadata collection templates, a centralized storage repository, and a unique ID linking system that can be used to easily port complete and compatible metagenomic metadata into widely used assembly and sequence analysis tools.ResultsMetazen was developed as a comprehensive framework designed to enable metadata capture for metagenomic sequencing projects. Specifically, Metazen provides a rapid, easy-to-use portal to encourage early deposition of project and sample metadata.ConclusionsMetazen is an interactive tool that aids users in recording their metadata in a complete and valid format. A defined set of mandatory fields captures vital information, while the option to add fields provides flexibility.