Christopher S. Henry
Argonne National Laboratory
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Christopher S. Henry.
Nature Biotechnology | 2010
Christopher S. Henry; Matthew DeJongh; Aaron A. Best; Paul M Frybarger; Ben Linsay; Rick Stevens
Genome-scale metabolic models have proven to be valuable for predicting organism phenotypes from genotypes. Yet efforts to develop new models are failing to keep pace with genome sequencing. To address this problem, we introduce the Model SEED, a web-based resource for high-throughput generation, optimization and analysis of genome-scale metabolic models. The Model SEED integrates existing methods and introduces techniques to automate nearly every step of this process, taking ∼48 h to reconstruct a metabolic model from an assembled genome sequence. We apply this resource to generate 130 genome-scale metabolic models representing a taxonomically diverse set of bacteria. Twenty-two of the models were validated against available gene essentiality and Biolog data, with the average model accuracy determined to be 66% before optimization and 87% after optimization.
Nucleic Acids Research | 2017
Alice R. Wattam; James J. Davis; Rida Assaf; Sébastien Boisvert; Thomas Brettin; Christopher Bun; Neal Conrad; Emily M. Dietrich; Terry Disz; Joseph L. Gabbard; Svetlana Gerdes; Christopher S. Henry; Ronald Kenyon; Dustin Machi; Chunhong Mao; Eric K. Nordberg; Gary J. Olsen; Daniel Murphy-Olson; Robert Olson; Ross Overbeek; Bruce Parrello; Gordon D. Pusch; Maulik Shukla; Veronika Vonstein; Andrew S. Warren; Fangfang Xia; Hyun Seung Yoo; Rick Stevens
The Pathosystems Resource Integration Center (PATRIC) is the bacterial Bioinformatics Resource Center (https://www.patricbrc.org). Recent changes to PATRIC include a redesign of the web interface and some new services that provide users with a platform that takes them from raw reads to an integrated analysis experience. The redesigned interface allows researchers direct access to tools and data, and the emphasis has changed to user-created genome-groups, with detailed summaries and views of the data that researchers have selected. Perhaps the biggest change has been the enhanced capability for researchers to analyze their private data and compare it to the available public data. Researchers can assemble their raw sequence reads and annotate the contigs using RASTtk. PATRIC also provides services for RNA-Seq, variation, model reconstruction and differential expression analysis, all delivered through an updated private workspace. Private data can be compared by ‘virtual integration’ to any of PATRICs public data. The number of genomes available for comparison in PATRIC has expanded to over 80 000, with a special emphasis on genomes with antimicrobial resistance data. PATRIC uses this data to improve both subsystem annotation and k-mer classification, and tags new genomes as having signatures that indicate susceptibility or resistance to specific antibiotics.
PLOS ONE | 2012
Ramy K. Aziz; Scott Devoid; Terrence Disz; Robert Edwards; Christopher S. Henry; Gary J. Olsen; Robert Olson; Ross Overbeek; Bruce Parrello; Gordon D. Pusch; Rick Stevens; Veronika Vonstein; Fangfang Xia
The remarkable advance in sequencing technology and the rising interest in medical and environmental microbiology, biotechnology, and synthetic biology resulted in a deluge of published microbial genomes. Yet, genome annotation, comparison, and modeling remain a major bottleneck to the translation of sequence information into biological knowledge, hence computational analysis tools are continuously being developed for rapid genome annotation and interpretation. Among the earliest, most comprehensive resources for prokaryotic genome analysis, the SEED project, initiated in 2003 as an integration of genomic data and analysis tools, now contains >5,000 complete genomes, a constantly updated set of curated annotations embodied in a large and growing collection of encoded subsystems, a derived set of protein families, and hundreds of genome-scale metabolic models. Until recently, however, maintaining current copies of the SEED code and data at remote locations has been a pressing issue. To allow high-performance remote access to the SEED database, we developed the SEED Servers (http://www.theseed.org/servers): four network-based servers intended to expose the data in the underlying relational database, support basic annotation services, offer programmatic access to the capabilities of the RAST annotation server, and provide access to a growing collection of metabolic models that support flux balance analysis. The SEED servers offer open access to regularly updated data, the ability to annotate prokaryotic genomes, the ability to create metabolic reconstructions and detailed models of metabolism, and access to hundreds of existing metabolic models. This work offers and supports a framework upon which other groups can build independent research efforts. Large integrations of genomic data represent one of the major intellectual resources driving research in biology, and programmatic access to the SEED data will provide significant utility to a broad collection of potential users.
Journal of Cheminformatics | 2015
James G. Jeffryes; Ricardo L Colastani; Mona Elbadawi-Sidhu; Tobias Kind; Thomas D. Niehaus; Linda J. Broadbelt; Andrew D. Hanson; Oliver Fiehn; Keith E.J. Tyo; Christopher S. Henry
BackgroundIn spite of its great promise, metabolomics has proven difficult to execute in an untargeted and generalizable manner. Liquid chromatography–mass spectrometry (LC–MS) has made it possible to gather data on thousands of cellular metabolites. However, matching metabolites to their spectral features continues to be a bottleneck, meaning that much of the collected information remains uninterpreted and that new metabolites are seldom discovered in untargeted studies. These challenges require new approaches that consider compounds beyond those available in curated biochemistry databases.DescriptionHere we present Metabolic In silico Network Expansions (MINEs), an extension of known metabolite databases to include molecules that have not been observed, but are likely to occur based on known metabolites and common biochemical reactions. We utilize an algorithm called the Biochemical Network Integrated Computational Explorer (BNICE) and expert-curated reaction rules based on the Enzyme Commission classification system to propose the novel chemical structures and reactions that comprise MINE databases. Starting from the Kyoto Encyclopedia of Genes and Genomes (KEGG) COMPOUND database, the MINE contains over 571,000 compounds, of which 93% are not present in the PubChem database. However, these MINE compounds have on average higher structural similarity to natural products than compounds from KEGG or PubChem. MINE databases were able to propose annotations for 98.6% of a set of 667 MassBank spectra, 14% more than KEGG alone and equivalent to PubChem while returning far fewer candidates per spectra than PubChem (46 vs. 1715 median candidates). Application of MINEs to LC–MS accurate mass data enabled the identity of an unknown peak to be confidently predicted.ConclusionsMINE databases are freely accessible for non-commercial use via user-friendly web-tools at http://minedatabase.mcs.anl.gov and developer-friendly APIs. MINEs improve metabolomics peak identification as compared to general chemical databases whose results include irrelevant synthetic compounds. Furthermore, MINEs complement and expand on previous in silico generated compound databases that focus on human metabolism. We are actively developing the database; future versions of this resource will incorporate transformation rules for spontaneous chemical reactions and more advanced filtering and prioritization of candidate structures.
Microbial Informatics and Experimenttation | 2011
Peter E. Larsen; Frank R. Collart; Dawn Field; Folker Meyer; Kevin P. Keegan; Christopher S. Henry; John W. McGrath; John P. Quinn; Jack A. Gilbert
BackgroundThe worlds oceans are home to a diverse array of microbial life whose metabolic activity helps to drive the earths biogeochemical cycles. Metagenomic analysis has revolutionized our access to these communities, providing a system-scale perspective of microbial community interactions. However, while metagenome sequencing can provide useful estimates of the relative change in abundance of specific genes and taxa between environments or over time, this does not investigate the relative changes in the production or consumption of different metabolites.ResultsWe propose a methodology, Predicted Relative Metabolic Turnover (PRMT) that defines and enables exploration of metabolite-space inferred from the metagenome. Our analysis of metagenomic data from a time-series study in the Western English Channel demonstrated considerable correlations between predicted relative metabolic turnover and seasonal changes in abundance of measured environmental parameters as well as with observed seasonal changes in bacterial population structure.ConclusionsThe PRMT method was successfully applied to metagenomic data to explore the Western English Channel microbial metabalome to generate specific, biologically testable hypotheses. Generated hypotheses linked organic phosphate utilization to Gammaproteobactaria, Plantcomycetes, and Betaproteobacteria, chitin degradation to Actinomycetes, and potential small molecule biosynthesis pathways for Lentisphaerae, Chlamydiae, and Crenarchaeota. The PRMT method can be applied as a general tool for the analysis of additional metagenomic or transcriptomic datasets.
Methods of Molecular Biology | 2013
Scott Devoid; Ross Overbeek; Matthew DeJongh; Veronika Vonstein; Aaron A. Best; Christopher S. Henry
Over the past decade, genome-scale metabolic models have proven to be a crucial resource for predicting organism phenotypes from genotypes. These models provide a means of rapidly translating detailed knowledge of thousands of enzymatic processes into quantitative predictions of whole-cell behavior. Until recently, the pace of new metabolic model development was eclipsed by the pace at which new genomes were being sequenced. To address this problem, the RAST and the Model SEED framework were developed as a means of automatically producing annotations and draft genome-scale metabolic models. In this chapter, we describe the automated model reconstruction process in detail, starting from a new genome sequence and finishing on a functioning genome-scale metabolic model. We break down the model reconstruction process into eight steps: submitting a genome sequence to RAST, annotating the genome, curating the annotation, submitting the annotation to Model SEED, reconstructing the core model, generating the draft biomass reaction, auto-completing the model, and curating the model. Each of these eight steps is documented in detail.
Nucleic Acids Research | 2013
Kosei Tanaka; Christopher S. Henry; Jenifer Zinner; Edmond Jolivet; Matthew Cohoon; Fangfang Xia; Vladimir Bidnenko; S. Dusko Ehrlich; Rick Stevens; Philippe Noirot
The nonessential regions in bacterial chromosomes are ill-defined due to incomplete functional information. Here, we establish a comprehensive repertoire of the genome regions that are dispensable for growth of Bacillus subtilis in a variety of media conditions. In complex medium, we attempted deletion of 157 individual regions ranging in size from 2 to 159 kb. A total of 146 deletions were successful in complex medium, whereas the remaining regions were subdivided to identify new essential genes (4) and coessential gene sets (7). Overall, our repertoire covers ∼76% of the genome. We screened for viability of mutant strains in rich defined medium and glucose minimal media. Experimental observations were compared with predictions by the iBsu1103 model, revealing discrepancies that led to numerous model changes, including the large-scale application of model reconciliation techniques. We ultimately produced the iBsu1103V2 model and generated predictions of metabolites that could restore the growth of unviable strains. These predictions were experimentally tested and demonstrated to be correct for 27 strains, validating the refinements made to the model. The iBsu1103V2 model has improved considerably at predicting loss of viability, and many insights gained from the model revisions have been integrated into the Model SEED to improve reconstruction of other microbial models.
Journal of Experimental Botany | 2012
Svetlana Gerdes; Claudia Lerma-Ortiz; Océane Frelin; Samuel M. D. Seaver; Christopher S. Henry; Valérie de Crécy-Lagard; Andrew D. Hanson
The B vitamins and the cofactors derived from them are essential for life. B vitamin synthesis in plants is consequently as crucial to plants themselves as it is to humans and animals, whose B vitamin nutrition depends largely on plants. The synthesis and salvage pathways for the seven plant B vitamins are now broadly known, but certain enzymes and many transporters have yet to be identified, and the subcellular locations of various reactions are unclear. Although very substantial, what is not known about plant B vitamin pathways is regrettably difficult to discern from the literature or from biochemical pathway databases. Nor do databases accurately represent all that is known about B vitamin pathways-above all their compartmentation-because the facts are scattered throughout the literature, and thus hard to piece together. These problems (i) deter discoveries because newcomers to B vitamins cannot see which mysteries still need solving; and (ii) impede metabolic reconstruction and modelling of B vitamin pathways because genes for reactions or transport steps are missing. This review therefore takes a fresh approach to capture current knowledge of B vitamin pathways in plants. The synthesis pathways, key salvage routes, and their subcellular compartmentation are surveyed in depth, and encoded in the SEED database (http://pubseed.theseed.org/seedviewer.cgi?page=PlantGateway) for Arabidopsis and maize. The review itself and the encoded pathways specifically identify enigmatic or missing reactions, enzymes, and transporters. The SEED-encoded B vitamin pathway collection is a publicly available, expertly curated, one-stop resource for metabolic reconstruction and modeling.
Proceedings of the National Academy of Sciences of the United States of America | 2014
Samuel M. D. Seaver; Svetlana Gerdes; Océane Frelin; Claudia Lerma-Ortiz; Louis Mt Bradbury; Rémi Zallot; Ghulam Hasnain; Thomas D. Niehaus; Basma El Yacoubi; Shiran Pasternak; Robert Olson; Gordon D. Pusch; Ross Overbeek; Rick Stevens; Valérie de Crécy-Lagard; Doreen Ware; Andrew D. Hanson; Christopher S. Henry
Significance Genes must be annotated with their correct functions if genome data are to support hypothesis building and metabolic engineering. PlantSEED was developed to streamline the process of annotating plant genome sequences, to construct metabolic models based on genome annotations automatically, and to use models to test the annotation of these sequences, allowing the detection of gaps and errors in gene annotations and the prediction of new functions. PlantSEED is designed to grow in an iterative manner by including new plant genome sequences, new annotations harvested from the literature, and improved biochemical data, all of which are integrated in a consistent manner into the PlantSEED genomes and metabolic models. The increasing number of sequenced plant genomes is placing new demands on the methods applied to analyze, annotate, and model these genomes. Today’s annotation pipelines result in inconsistent gene assignments that complicate comparative analyses and prevent efficient construction of metabolic models. To overcome these problems, we have developed the PlantSEED, an integrated, metabolism-centric database to support subsystems-based annotation and metabolic model reconstruction for plant genomes. PlantSEED combines SEED subsystems technology, first developed for microbial genomes, with refined protein families and biochemical data to assign fully consistent functional annotations to orthologous genes, particularly those encoding primary metabolic pathways. Seamless integration with its parent, the prokaryotic SEED database, makes PlantSEED a unique environment for cross-kingdom comparative analysis of plant and bacterial genomes. The consistent annotations imposed by PlantSEED permit rapid reconstruction and modeling of primary metabolism for all plant genomes in the database. This feature opens the unique possibility of model-based assessment of the completeness and accuracy of gene annotation and thus allows computational identification of genes and pathways that are restricted to certain genomes or need better curation. We demonstrate the PlantSEED system by producing consistent annotations for 10 reference genomes. We also produce a functioning metabolic model for each genome, gapfilling to identify missing annotations and proposing gene candidates for missing annotations. Models are built around an extended biomass composition representing the most comprehensive published to date. To our knowledge, our models are the first to be published for seven of the genomes analyzed.
PLOS Computational Biology | 2014
Matthew N. Benedict; Michael B. Mundy; Christopher S. Henry; Nicholas Chia; Nathan D. Price
Genome-scale metabolic models provide a powerful means to harness information from genomes to deepen biological insights. With exponentially increasing sequencing capacity, there is an enormous need for automated reconstruction techniques that can provide more accurate models in a short time frame. Current methods for automated metabolic network reconstruction rely on gene and reaction annotations to build draft metabolic networks and algorithms to fill gaps in these networks. However, automated reconstruction is hampered by database inconsistencies, incorrect annotations, and gap filling largely without considering genomic information. Here we develop an approach for applying genomic information to predict alternative functions for genes and estimate their likelihoods from sequence homology. We show that computed likelihood values were significantly higher for annotations found in manually curated metabolic networks than those that were not. We then apply these alternative functional predictions to estimate reaction likelihoods, which are used in a new gap filling approach called likelihood-based gap filling to predict more genomically consistent solutions. To validate the likelihood-based gap filling approach, we applied it to models where essential pathways were removed, finding that likelihood-based gap filling identified more biologically relevant solutions than parsimony-based gap filling approaches. We also demonstrate that models gap filled using likelihood-based gap filling provide greater coverage and genomic consistency with metabolic gene functions compared to parsimony-based approaches. Interestingly, despite these findings, we found that likelihoods did not significantly affect consistency of gap filled models with Biolog and knockout lethality data. This indicates that the phenotype data alone cannot necessarily be used to discriminate between alternative solutions for gap filling and therefore, that the use of other information is necessary to obtain a more accurate network. All described workflows are implemented as part of the DOE Systems Biology Knowledgebase (KBase) and are publicly available via API or command-line web interface.