Is this you? Create Your Porfile

Douglas J. Baxter

Pacific Northwest National Laboratory

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Douglas J. Baxter is active.

Explore More

Publication

Featured researches published by Douglas J. Baxter.

international conference on e-science | 2009

A High-Performance Hybrid Computing Approach to Massive Contingency Analysis in the Power Grid

Ian Gorton; Zhenyu Huang; Yousu Chen; Benson K. Kalahar; Shuangshuang Jin; Daniel G. Chavarría-Miranda; Douglas J. Baxter; John Feo

Operating the electrical power grid to prevent power black-outs is a complex task. An important aspect of this is contingency analysis, which involves understanding and mitigating potential failures in power grid elements such as transmission lines. When taking into account the potential for multiple simultaneous failures (known as the N-x contingency problem), contingency analysis becomes a massively computational task. In this paper we describe a novel hybrid computational approach to contingency analysis. This approach exploits the unique graph processing performance of the Cray XMT in conjunction with a conventional massively parallel compute cluster to identify likely simultaneous failures that could cause widespread cascading power failures that have massive economic and social impact on society. The approach has the potential to provide the first practical and scalable solution to the N-x contingency problem. When deployed in power grid operations, it will increase the grid operator’s ability to deal effectively with outages and failures with power grid components while preserving stable and safe operation of the grid. The paper describes the architecture of our solution and presents preliminary performance results that validate the efficacy of our approach.

Bioinformatics | 2013

ScalaBLAST 2.0

Christopher S. Oehmen; Douglas J. Baxter

Motivation: BLAST remains one of the most widely used tools in computational biology. The rate at which new sequence data is available continues to grow exponentially, driving the emergence of new fields of biological research. At the same time, multicore systems and conventional clusters are more accessible. ScalaBLAST has been designed to run on conventional multiprocessor systems with an eye to extreme parallelism, enabling parallel BLAST calculations using >16 000 processing cores with a portable, robust, fault-resilient design that introduces little to no overhead with respect to serial BLAST. Availability: ScalaBLAST 2.0 source code can be freely downloaded from http://omics.pnl.gov/software/ScalaBLAST.php. Contact: [email protected]

Journal of Proteome Research | 2011

Large Improvements in MS/MS Based Peptide Identification Rates using a Hybrid Analysis

William R. Cannon; Mitchell M. Rawlins; Douglas J. Baxter; Stephen J. Callister; Mary S. Lipton; Donald A. Bryant

We report a hybrid search method combining database and spectral library searches that allows for a straightforward approach to characterizing the error rates from the combined data. Using these methods, we demonstrate significantly increased sensitivity and specificity in matching peptides to tandem mass spectra. The hybrid search method increased the number of spectra that can be assigned to a peptide in a global proteomics study by 57-147% at an estimated false discovery rate of 5%, with clear room for even greater improvements. The approach combines the general utility of using consensus model spectra typical of database search methods with the accuracy of the intensity information contained in spectral libraries. A common scoring metric based on recent developments linking data analysis and statistical thermodynamics is used, which allows the use of a conservative estimate of error rates for the combined data. We applied this approach to proteomics analysis of Synechococcus sp. PCC 7002, a cyanobacterium that is a model organism for studies of photosynthetic carbon fixation and biofuels development. The increased specificity and sensitivity of this approach allowed us to identify many more peptides involved in the processes important for photoautotrophic growth.

international parallel and distributed processing symposium | 2004

Constrained de novo peptide identification via multi-objective optimization

Joël M. Malard; Alejandro Heredia-Langner; Douglas J. Baxter; Kristin H. Jarman; William R. Cannon

Summary form only given. Automatic de novo peptide identification from collision-induced dissociation tandem mass spectrometry data is made difficult by large plateaus in the fitness landscapes of scoring functions and the fuzzy nature of the constraints that is due to noise in the data. A framework is presented for combining different peptide identification methods within a parallel genetic algorithm. The distinctive feature of our approach, based on Pareto ranking, is that it can accommodate constraints and possibly conflicting scoring functions. We have also shown how population structure can significantly improve the wall clock time of a parallel peptide identification genetic algorithm while at the same time maintaining some exchange of information across local populations.

Journal of Physical Chemistry B | 2014

Comparison of Optimal Thermodynamic Models of the Tricarboxylic Acid Cycle from Heterotrophs, Cyanobacteria, and Green Sulfur Bacteria.

Dennis G. Thomas; Sebastian Jaramillo-Riveri; Douglas J. Baxter; William R. Cannon

We have applied a new stochastic simulation approach to predict the metabolite levels, material flux, and thermodynamic profiles of the oxidative TCA cycles found in E. coli and Synechococcus sp. PCC 7002, and in the reductive TCA cycle typical of chemolithoautotrophs and phototrophic green sulfur bacteria such as Chlorobaculum tepidum. The simulation approach is based on modeling states using statistical thermodynamics and employs an assumption similar to that used in transition state theory. The ability to evaluate the thermodynamics of metabolic pathways allows one to understand the relationship between coupling of energy and material gradients in the environment and the self-organization of stable biological systems, and it is shown that each cycle operates in the direction expected due to its environmental niche. The simulations predict changes in metabolite levels and flux in response to changes in cofactor concentrations that would be hard to predict without an elaborate model based on the law of mass action. In fact, we show that a thermodynamically unfavorable reaction can still have flux in the forward direction when it is part of a reaction network. The ability to predict metabolite levels, energy flow, and material flux should be significant for understanding the dynamics of natural systems and for understanding principles for engineering organisms for production of specialty chemicals.

Concurrency and Computation: Practice and Experience | 2005

Peptide identification via constrained multi‐objective optimization: Pareto‐based genetic algorithms

Joël M. Malard; Alejandro Heredia-Langner; William R. Cannon; Ryan W. Mooney; Douglas J. Baxter

Automatic peptide identification from collision‐induced dissociation tandem mass spectrometry data using optimization techniques is made difficult by large plateaus in the fitness landscapes of scoring functions, by the fuzzy nature of constraints from noisy data and by the existence of diverse but equally justifiable probabilistic models of peak matching. Here, two different scoring functions are combined into a parallel multi‐objective optimization framework. It is shown how multi‐objective optimization can be used to empirically test for independence between distinct scoring functions. The loss of selection pressure during the evolution of a population of putative peptide sequences by a Pareto‐driven genetic algorithm is addressed by alternating between two definitions of fitness according to a numerical threshold. Copyright

Advances in Computers | 2010

Applications in Data-Intensive Computing

Anuj R. Shah; Joshua N. Adkins; Douglas J. Baxter; William R. Cannon; Daniel G. Chavarría-Miranda; Sutanay Choudhury; Ian Gorton; Deborah K. Gracio; Todd D. Halter; Navdeep Jaitly; John R. Johnson; Richard T. Kouzes; Matthew C. Macduff; Andres Marquez; Matthew E. Monroe; Christopher S. Oehmen; William A. Pike; Chad Scherrer; Oreste Villa; Bobbie-Jo M. Webb-Robertson; Paul D. Whitney; Nino Zuljevic

Abstract The total quantity of digital information in the world is growing at an alarming rate. Scientists and engineers are contributing heavily to this data “tsunami” by gathering data using computing and instrumentation at incredible rates. As data volumes and complexity grow, it is increasingly arduous to extract valuable information from the data and derive knowledge from that data. Addressing these demands of ever-growing data volumes and complexity requires game-changing advances in software, hardware, and algorithms. Solution technologies also must scale to handle the increased data collection and processing rates and simultaneously accelerate timely and effective analysis results. This need for ever faster data processing and manipulation as well as algorithms that scale to high-volume data sets have given birth to a new paradigm or discipline known as “data-intensive computing.” In this chapter, we define data-intensive computing, identify the challenges of massive data, outline solutions for hardware, software, and analytics, and discuss a number of applications in the areas of biology, cyber security, and atmospheric research.

pacific symposium on biocomputing | 2011

Proteotyping of microbial communities by optimization of tandem mass spectrometry data interpretation.

Alys E. Hugo; Douglas J. Baxter; William R. Cannon; Anantharaman Kalyanaraman; Gaurav Ramesh Kulkarni; Stephen J. Callister

We report the development of a novel high performance computing method for the identification of proteins from unknown (environmental) samples. The method uses computational optimization to provide an effective way to control the false discovery rate for environmental samples and complements de novo peptide sequencing. Furthermore, the method provides information based on the expressed protein in a microbial community, and thus complements DNA-based identification methods. Testing on blind samples demonstrates that the method provides 79-95% overlap with analogous results from searches involving only the correct genomes. We provide scaling and performance evaluations for the software that demonstrate the ability to carry out large-scale optimizations on 1258 genomes containing 4.2M proteins.

Lawrence Berkeley National Laboratory | 2009

Bringing large-scale multiple genome analysis one step closer: ScalaBLAST and beyond

Christopher S. Oehmen; Heidi J. Sofia; Douglas J. Baxter; Ernest Szeto; Philip Hugenholtz; Nikos C. Kyrpides; Victor Markowitz; Tjerk P. Straatsma

Bringing large-scale multiple genome analysis one step closer: ScalaBLAST and beyond Christopher S. Oehmen 1 , Heidi J. Sofia 1 , Douglas Baxter 2 , Ernest Szeto 3 , Philip Hugenholtz 4 , Nikos Kyrpides 5 , Victor Markowitz 3 , Tjerk P. Straatsma 1 Computational Sciences and Mathematics Division, Pacific Northwest National Laboratory (PNNL), 902 Battelle Boulevard, P.O. Box 999, Richland, WA USA William R. Wiley Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory (PNNL), 902 Battelle Boulevard, P.O. Box 999, Richland, WA USA Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, USA Microbial Ecology Program, Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, USA Microbial Genome Analysis Program, Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, USA Genome sequence comparisons of exponentially growing data sets form the foundation for the comparative analysis tools provided by community biological data resources such as the Integrated Microbial Genome (IMG) system at the Joint Genome Institute (JGI). We present an example of how ScalaBLAST, a high-throughput sequence analysis program harnesses increasingly critical high-performance computing to perform sequence analysis which is a critical component of maintaining a state-of-the-art sequence data repository. The Integrated Microbial Genomes (IMG) system 1 is a data management and analysis platform for microbial genomes hosted at the JGI. IMG contains both draft and complete JGI genomes integrated with other publicly available microbial genomes of all three domains of life. IMG provides tools and viewers for interactive analysis of genomes, genes and functions, individually or in a comparative context. Most of these tools are based on pre-computed pairwise sequence similarities involving millions of genes. These computations are becoming prohibitively time consuming with the rapid increase in the number of newly sequenced genomes incorporated into IMG and the need to refresh regularly the content of IMG in order to reflect changes in the annotations of existing genomes. Thus, building IMG 2.0 (released on December 1 st 2006) entailed reloading from NCBI’s RefSeq all the genomes in the previous version of IMG (IMG 1.6, as of September 1 st , 2006) together with 1,541 new public microbial,viral and eukaryal genomes, bringing the total of IMG genomes to 2,301. A critical part of building IMG 2.0 involved using PNNL ScalaBLAST software for computing pairwise similarities for over 2.2 million genes in under 26 hours on 1,000 processors, thus illustrating the impact that new generation bioinformatics tools are poised to make in biology. The BLAST algorithm 2, 3 is a familiar bioinformatics application for computing sequence similarity, and has become a workhorse in large-scale genomics projects. The rapid growth of genome resources such as IMG cannot be sustained without more powerful tools such as ScalaBLAST that use more effectively large scale computing resources to perform the core BLAST calculations.

conference on high performance computing (supercomputing) | 2006

High throughput feature-matching analysis of biological spectral data

Christopher S. Oehmen; Douglas J. Baxter; Ryan W. Mooney; Shaun O'Leary; Tim Carlson

n order to overcome the limitations of traditional monolithic firewalls, Pacific Northwest National Laboratory (PNNL) has implemented a Secure Collaboration Zone (SCZ) to provide high performance connectivity using a host-based security model where the network firewall is replaced by a centrally managed host-based firewall on each system in the SCZ. This approach, coupled with intrusion detection capabilities and tight system configuration management controls, eliminates the firewall bottleneck and creates a layered security model that enables secure data transfer at rates which scale to todays system and network capabilities. For SC06 we will be using Polygraph, a high-performance implementation of a statistical and physical model-based feature matching algorithm, to run identification calculations on spectral data resulting from tandem mass-spectrometry experiments. In particular, a compute cluster at SC06 will run Polygraph to access a preprocessed information archive of spectral data located at PNNL in the SCZ.

Explore More