Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Matthew DeJongh is active.

Publication


Featured researches published by Matthew DeJongh.


BMC Genomics | 2008

The RAST Server: Rapid Annotations using Subsystems Technology

Ramy K. Aziz; Daniela Bartels; Aaron A. Best; Matthew DeJongh; Terrence Disz; Robert Edwards; Kevin Formsma; Svetlana Gerdes; Elizabeth M. Glass; Michael Kubal; Folker Meyer; Gary J. Olsen; Robert Olson; Andrei L. Osterman; Ross Overbeek; Leslie K. McNeil; Daniel Paarmann; Tobias Paczian; Bruce Parrello; Gordon D. Pusch; Claudia I. Reich; Rick Stevens; Olga Vassieva; Veronika Vonstein; Andreas Wilke; Olga Zagnitko

BackgroundThe number of prokaryotic genome sequences becoming available is growing steadily and is growing faster than our ability to accurately annotate them.DescriptionWe describe a fully automated service for annotating bacterial and archaeal genomes. The service identifies protein-encoding, rRNA and tRNA genes, assigns functions to the genes, predicts which subsystems are represented in the genome, uses this information to reconstruct the metabolic network and makes the output easily downloadable for the user. In addition, the annotated genome can be browsed in an environment that supports comparative analysis with the annotated genomes maintained in the SEED environment.The service normally makes the annotated genome available within 12–24 hours of submission, but ultimately the quality of such a service will be judged in terms of accuracy, consistency, and completeness of the produced annotations. We summarize our attempts to address these issues and discuss plans for incrementally enhancing the service.ConclusionBy providing accurate, rapid annotation freely to the community we have created an important community resource. The service has now been utilized by over 120 external users annotating over 350 distinct genomes.


Nature Biotechnology | 2010

High-throughput generation, optimization and analysis of genome-scale metabolic models

Christopher S. Henry; Matthew DeJongh; Aaron A. Best; Paul M Frybarger; Ben Linsay; Rick Stevens

Genome-scale metabolic models have proven to be valuable for predicting organism phenotypes from genotypes. Yet efforts to develop new models are failing to keep pace with genome sequencing. To address this problem, we introduce the Model SEED, a web-based resource for high-throughput generation, optimization and analysis of genome-scale metabolic models. The Model SEED integrates existing methods and introduces techniques to automate nearly every step of this process, taking ∼48 h to reconstruct a metabolic model from an assembled genome sequence. We apply this resource to generate 130 genome-scale metabolic models representing a taxonomically diverse set of bacteria. Twenty-two of the models were validated against available gene essentiality and Biolog data, with the average model accuracy determined to be 66% before optimization and 87% after optimization.


BMC Bioinformatics | 2007

Toward the automated generation of genome-scale metabolic networks in the SEED

Matthew DeJongh; Kevin Formsma; Paul Boillot; John Gould; Matthew Rycenga; Aaron A. Best

BackgroundCurrent methods for the automated generation of genome-scale metabolic networks focus on genome annotation and preliminary biochemical reaction network assembly, but do not adequately address the process of identifying and filling gaps in the reaction network, and verifying that the network is suitable for systems level analysis. Thus, current methods are only sufficient for generating draft-quality networks, and refinement of the reaction network is still largely a manual, labor-intensive process.ResultsWe have developed a method for generating genome-scale metabolic networks that produces substantially complete reaction networks, suitable for systems level analysis. Our method partitions the reaction space of central and intermediary metabolism into discrete, interconnected components that can be assembled and verified in isolation from each other, and then integrated and verified at the level of their interconnectivity. We have developed a database of components that are common across organisms, and have created tools for automatically assembling appropriate components for a particular organism based on the metabolic pathways encoded in the organisms genome. This focuses manual efforts on that portion of an organisms metabolism that is not yet represented in the database. We have demonstrated the efficacy of our method by reverse-engineering and automatically regenerating the reaction network from a published genome-scale metabolic model for Staphylococcus aureus. Additionally, we have verified that our method capitalizes on the database of common reaction network components created for S. aureus, by using these components to generate substantially complete reconstructions of the reaction networks from three other published metabolic models (Escherichia coli, Helicobacter pylori, and Lactococcus lactis). We have implemented our tools and database within the SEED, an open-source software environment for comparative genome annotation and analysis.ConclusionOur method sets the stage for the automated generation of substantially complete metabolic networks for over 400 complete genome sequences currently in the SEED. With each genome that is processed using our tools, the database of common components grows to cover more of the diversity of metabolic pathways. This increases the likelihood that components of reaction networks for subsequently processed genomes can be retrieved from the database, rather than assembled and verified manually.


Methods of Molecular Biology | 2013

Automated Genome Annotation and Metabolic Model Reconstruction in the SEED and Model SEED

Scott Devoid; Ross Overbeek; Matthew DeJongh; Veronika Vonstein; Aaron A. Best; Christopher S. Henry

Over the past decade, genome-scale metabolic models have proven to be a crucial resource for predicting organism phenotypes from genotypes. These models provide a means of rapidly translating detailed knowledge of thousands of enzymatic processes into quantitative predictions of whole-cell behavior. Until recently, the pace of new metabolic model development was eclipsed by the pace at which new genomes were being sequenced. To address this problem, the RAST and the Model SEED framework were developed as a means of automatically producing annotations and draft genome-scale metabolic models. In this chapter, we describe the automated model reconstruction process in detail, starting from a new genome sequence and finishing on a functioning genome-scale metabolic model. We break down the model reconstruction process into eight steps: submitting a genome sequence to RAST, annotating the genome, curating the annotation, submitting the annotation to Model SEED, reconstructing the core model, generating the draft biomass reaction, auto-completing the model, and curating the model. Each of these eight steps is documented in detail.


Journal of Bacteriology | 2011

Inference of the Transcriptional Regulatory Network in Staphylococcus aureus by Integration of Experimental and Genomics-Based Evidence

Dmitry A. Ravcheev; Aaron A. Best; Nathan L. Tintle; Matthew DeJongh; Andrei L. Osterman; Pavel S. Novichkov; Dmitry A. Rodionov

Transcriptional regulatory networks are fine-tuned systems that help microorganisms respond to changes in the environment and cell physiological state. We applied the comparative genomics approach implemented in the RegPredict Web server combined with SEED subsystem analysis and available information on known regulatory interactions for regulatory network reconstruction for the human pathogen Staphylococcus aureus and six related species from the family Staphylococcaceae. The resulting reference set of 46 transcription factor regulons contains more than 1,900 binding sites and 2,800 target genes involved in the central metabolism of carbohydrates, amino acids, and fatty acids; respiration; the stress response; metal homeostasis; drug and metal resistance; and virulence. The inferred regulatory network in S. aureus includes ∼320 regulatory interactions between 46 transcription factors and ∼550 candidate target genes comprising 20% of its genome. We predicted ∼170 novel interactions and 24 novel regulons for the control of the central metabolic pathways in S. aureus. The reconstructed regulons are largely variable in the Staphylococcaceae: only 20% of S. aureus regulatory interactions are conserved across all studied genomes. We used a large-scale gene expression data set for S. aureus to assess relationships between the inferred regulons and gene expression patterns. The predicted reference set of regulons is captured within the Staphylococcus collection in the RegPrecise database (http://regprecise.lbl.gov).


Biochimica et Biophysica Acta | 2011

Connecting genotype to phenotype in the era of high-throughput sequencing.

Christopher S. Henry; Ross Overbeek; Fangfang Xia; Aaron A. Best; Elizabeth M. Glass; Jack A. Gilbert; Peter E. Larsen; Robert Edwards; Terry Disz; Folker Meyer; Veronika Vonstein; Matthew DeJongh; Daniela Bartels; Narayan Desai; Mark D'Souza; Scott Devoid; Kevin P. Keegan; Robert Olson; Andreas Wilke; Jared Wilkening; Rick Stevens

BACKGROUND The development of next generation sequencing technology is rapidly changing the face of the genome annotation and analysis field. One of the primary uses for genome sequence data is to improve our understanding and prediction of phenotypes for microbes and microbial communities, but the technologies for predicting phenotypes must keep pace with the new sequences emerging. SCOPE OF REVIEW This review presents an integrated view of the methods and technologies used in the inference of phenotypes for microbes and microbial communities based on genomic and metagenomic data. Given the breadth of this topic, we place special focus on the resources available within the SEED Project. We discuss the two steps involved in connecting genotype to phenotype: sequence annotation, and phenotype inference, and we highlight the challenges in each of these steps when dealing with both single genome and metagenome data. MAJOR CONCLUSIONS This integrated view of the genotype-to-phenotype problem highlights the importance of a controlled ontology in the annotation of genomic data, as this benefits subsequent phenotype inference and metagenome annotation. We also note the importance of expanding the set of reference genomes to improve the annotation of all sequence data, and we highlight metagenome assembly as a potential new source for complete genomes. Finally, we find that phenotype inference, particularly from metabolic models, generates predictions that can be validated and reconciled to improve annotations. GENERAL SIGNIFICANCE This review presents the first look at the challenges and opportunities associated with the inference of phenotype from genotype during the next generation sequencing revolution. This article is part of a Special Issue entitled: Systems Biology of Microorganisms.


bioRxiv | 2016

The DOE Systems Biology Knowledgebase (KBase)

Adam P. Arkin; Rick Stevens; Robert W. Cottingham; Sergei Maslov; Christopher S. Henry; Paramvir Dehal; Doreen Ware; Fernando Perez; Nomi L. Harris; Shane Canon; Michael W Sneddon; Matthew L Henderson; William J Riehl; Dan Gunter; Dan Murphy-Olson; Stephen Chan; Roy T Kamimura; Thomas S Brettin; Folker Meyer; Dylan Chivian; David J. Weston; Elizabeth M. Glass; Brian H. Davison; Sunita Kumari; Benjamin H Allen; Jason K. Baumohl; Aaron A. Best; Ben Bowen; Steven E. Brenner; Christopher C Bun

The U.S. Department of Energy Systems Biology Knowledgebase (KBase) is an open-source software and data platform designed to meet the grand challenge of systems biology — predicting and designing biological function from the biomolecular (small scale) to the ecological (large scale). KBase is available for anyone to use, and enables researchers to collaboratively generate, test, compare, and share hypotheses about biological functions; perform large-scale analyses on scalable computing infrastructure; and combine experimental evidence and conclusions that lead to accurate models of plant and microbial physiology and community dynamics. The KBase platform has (1) extensible analytical capabilities that currently include genome assembly, annotation, ontology assignment, comparative genomics, transcriptomics, and metabolic modeling; (2) a web-browser-based user interface that supports building, sharing, and publishing reproducible and well-annotated analyses with integrated data; (3) access to extensive computational resources; and (4) a software development kit allowing the community to add functionality to the system.


BMC Bioinformatics | 2008

Gene set analyses for interpreting microarray experiments on prokaryotic organisms

Nathan L. Tintle; Aaron A. Best; Matthew DeJongh; Dirk Van Bruggen; Fred Heffron; Steffen Porwollik; Ronald C. Taylor

BackgroundDespite the widespread usage of DNA microarrays, questions remain about how best to interpret the wealth of gene-by-gene transcriptional levels that they measure. Recently, methods have been proposed which use biologically defined sets of genes in interpretation, instead of examining results gene-by-gene. Despite a serious limitation, a method based on Fishers exact test remains one of the few plausible options for gene set analysis when an experiment has few replicates, as is typically the case for prokaryotes.ResultsWe extend five methods of gene set analysis from use on experiments with multiple replicates, for use on experiments with few replicates. We then use simulated and real data to compare these methods with each other and with the Fishers exact test (FET) method. As a result of the simulation we find that a method named MAXMEAN-NR, maintains the nominal rate of false positive findings (type I error rate) while offering good statistical power and robustness to a variety of gene set distributions for set sizes of at least 10. Other methods (ABSSUM-NR or SUM-NR) are shown to be powerful for set sizes less than 10. Analysis of three sets of experimental data shows similar results. Furthermore, the MAXMEAN-NR method is shown to be able to detect biologically relevant sets as significant, when other methods (including FET) cannot. We also find that the popular GSEA-NR method performs poorly when compared to MAXMEAN-NR.ConclusionMAXMEAN-NR is a method of gene set analysis for experiments with few replicates, as is common for prokaryotes. Results of simulation and real data analysis suggest that the MAXMEAN-NR method offers increased robustness and biological relevance of findings as compared to FET and other methods, while maintaining the nominal type I error rate.


Bioinformatics | 2012

CytoSEED: a Cytoscape plugin for viewing, manipulating and analyzing metabolic models created by the Model SEED

Matthew DeJongh; Benjamin Bockstege; Paul M Frybarger; Nicholas Hazekamp; Joshua Kammeraad; Travis McGeehan

Summary: CytoSEED is a Cytoscape plugin for viewing, manipulating and analyzing metabolic models created using the Model SEED. The CytoSEED plugin enables users of the Model SEED to create informative visualizations of the reaction networks generated for their organisms of interest. These visualizations are useful for understanding organism-specific biochemistry and for highlighting the results of flux variability analysis experiments. Availability and Implementation: Freely available for download on the web at http://sourceforge.net/projects/cytoseed/. Implemented in Java SE 6 and supported on all platforms that support Cytoscape. Contact: [email protected] Supplementary information: Installation instructions, a tutorial, and full-size figures are available at http://www.cs.hope.edu/cytoseed/.


technical symposium on computer science education | 2011

Increasing engagement and enrollment in breadth-first introductory courses using authentic computing tasks

Ryan McFall; Matthew DeJongh

The breadth-first approach to teaching introductory computer science is one way of dispelling the common misperception that programming is the sole task of the computer scientist. The breadth-first approach is particularly useful in courses for non-majors. Hands-on activities that make up laboratory assignments for these courses tend to focus on learning to program or simulations of program execution. These activities unfortunately fail to build on the foundations laid by a breadth-first approach, and serve to perpetuate the computer science = programming misperception. We have developed a set of laboratory activities which are based on what we call authentic computing tasks: everyday tasks that students want to know how to accomplish. Example tasks include image editing, operating system installation and configuration, and building home computer networks. Explicit connections are made between these authentic computing tasks and the computer science concepts being covered in the lecture portion of the course. The course has experienced dramatic increases in enrollment, and we have evidence that students see the connections, rather than coming to believe that performing computing tasks well is the essence of computer science.

Collaboration


Dive into the Matthew DeJongh's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Rick Stevens

Argonne National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Elizabeth M. Glass

Argonne National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Ross Overbeek

Argonne National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Folker Meyer

Argonne National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Nomi L. Harris

Lawrence Berkeley National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Veronika Vonstein

Argonne National Laboratory

View shared research outputs
Researchain Logo
Decentralizing Knowledge