William R. Cannon
Pacific Northwest National Laboratory
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by William R. Cannon.
Bioinformatics | 2008
Bobbie-Jo M. Webb-Robertson; William R. Cannon; Christopher S. Oehmen; Anuj R. Shah; Vidhya Gurumoorthi; Mary S. Lipton; Katrina M. Waters
MOTIVATION The standard approach to identifying peptides based on accurate mass and elution time (AMT) compares profiles obtained from a high resolution mass spectrometer to a database of peptides previously identified from tandem mass spectrometry (MS/MS) studies. It would be advantageous, with respect to both accuracy and cost, to only search for those peptides that are detectable by MS (proteotypic). RESULTS We present a support vector machine (SVM) model that uses a simple descriptor space based on 35 properties of amino acid content, charge, hydrophilicity and polarity for the quantitative prediction of proteotypic peptides. Using three independently derived AMT databases (Shewanella oneidensis, Salmonella typhimurium, Yersinia pestis) for training and validation within and across species, the SVM resulted in an average accuracy measure of 0.8 with a SD of <0.025. Furthermore, we demonstrate that these results are achievable with a small set of 12 variables and can achieve high proteome coverage. AVAILABILITY http://omics.pnl.gov/software/STEPP.php. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
IEEE Transactions on Parallel and Distributed Systems | 2012
Changjun Wu; Anantharaman Kalyanaraman; William R. Cannon
Detecting sequence homology between protein sequences is a fundamental problem in computational molecular biology, with a pervasive application in nearly all analyses that aim to structurally and functionally characterize protein molecules. While detecting the homology between two protein sequences is relatively inexpensive, detecting pairwise homology for a large number of protein sequences can become computationally prohibitive for modern inputs, often requiring millions of CPU hours. Yet, there is currently no robust support to parallelize this kernel. In this paper, we identify the key characteristics that make this problem particularly hard to parallelize, and then propose a new parallel algorithm that is suited for detecting homology on large data sets using distributed memory parallel computers. Our method, called pGraph, is a novel hybrid between the hierarchical multiple-master/worker model and producer-consumer model, and is designed to break the irregularities imposed by alignment computation and work generation. Experimental results show that pGraph achieves linear scaling on a 2,048 processor distributed memory cluster for a wide range of inputs ranging from as small as 20,000 sequences to 2,560,000 sequences. In addition to demonstrating strong scaling, we present an extensive report on the performance of the various system components and related parametric studies.
BMC Genomics | 2012
Elena S. Peterson; Lee Ann McCue; Alexandra C. Schrimpe-Rutledge; Jeffrey L. Jensen; Hyunjoo Walker; Markus A. Kobold; Samantha R Webb; Samuel H. Payne; Charles Ansong; Joshua N. Adkins; William R. Cannon; Bobbie-Jo M. Webb-Robertson
BackgroundThe procedural aspects of genome sequencing and assembly have become relatively inexpensive, yet the full, accurate structural annotation of these genomes remains a challenge. Next-generation sequencing transcriptomics (RNA-Seq), global microarrays, and tandem mass spectrometry (MS/MS)-based proteomics have demonstrated immense value to genome curators as individual sources of information, however, integrating these data types to validate and improve structural annotation remains a major challenge. Current visual and statistical analytic tools are focused on a single data type, or existing software tools are retrofitted to analyze new data forms. We present Visual Exploration and Statistics to Promote Annotation (VESPA) is a new interactive visual analysis software tool focused on assisting scientists with the annotation of prokaryotic genomes though the integration of proteomics and transcriptomics data with current genome location coordinates.ResultsVESPA is a desktop Java™ application that integrates high-throughput proteomics data (peptide-centric) and transcriptomics (probe or RNA-Seq) data into a genomic context, all of which can be visualized at three levels of genomic resolution. Data is interrogated via searches linked to the genome visualizations to find regions with high likelihood of mis-annotation. Search results are linked to exports for further validation outside of VESPA or potential coding-regions can be analyzed concurrently with the software through interaction with BLAST. VESPA is demonstrated on two use cases (Yersinia pestis Pestoides F and Synechococcus sp. PCC 7002) to demonstrate the rapid manner in which mis-annotations can be found and explored in VESPA using either proteomics data alone, or in combination with transcriptomic data.ConclusionsVESPA is an interactive visual analytics tool that integrates high-throughput data into a genomic context to facilitate the discovery of structural mis-annotations in prokaryotic genomes. Data is evaluated via visual analysis across multiple levels of genomic resolution, linked searches and interaction with existing bioinformatics tools. We highlight the novel functionality of VESPA and core programming requirements for visualization of these large heterogeneous datasets for a client-side application. The software is freely available at https://www.biopilot.org/docs/Software/Vespa.php.
Bioinformatics | 2004
Alejandro Heredia-Langner; William R. Cannon; Kenneth D. Jarman; Kristin H. Jarman
MOTIVATION Peptide identification following tandem mass spectrometry (MS/MS) is usually achieved by searching for the best match between the mass spectrum of an unidentified peptide and model spectra generated from peptides in a sequence database. This methodology will be successful only if the peptide under investigation belongs to an available database. Our objective is to develop and test the performance of a heuristic optimization algorithm capable of dealing with some features commonly found in actual MS/MS spectra that tend to stop simpler deterministic solution approaches. RESULTS We present the implementation of a Genetic Algorithm (GA) in the reconstruction of amino acid sequences using only spectral features, discuss some of the problems associated with this approach and compare its performance to a de novo sequencing method. The GA can potentially overcome some of the most problematic aspects associated with de novo analysis of real MS/MS data such as missing or unclearly defined peaks and may prove to be a valuable tool in the proteomics field. We assess the performance of our algorithm under conditions of perfect spectral information, in situations where key spectral features are missing, and using real MS/MS spectral data.
Journal of Proteome Research | 2011
William R. Cannon; Mitchell M. Rawlins; Douglas J. Baxter; Stephen J. Callister; Mary S. Lipton; Donald A. Bryant
We report a hybrid search method combining database and spectral library searches that allows for a straightforward approach to characterizing the error rates from the combined data. Using these methods, we demonstrate significantly increased sensitivity and specificity in matching peptides to tandem mass spectra. The hybrid search method increased the number of spectra that can be assigned to a peptide in a global proteomics study by 57-147% at an estimated false discovery rate of 5%, with clear room for even greater improvements. The approach combines the general utility of using consensus model spectra typical of database search methods with the accuracy of the intensity information contained in spectral libraries. A common scoring metric based on recent developments linking data analysis and statistical thermodynamics is used, which allows the use of a conservative estimate of error rates for the combined data. We applied this approach to proteomics analysis of Synechococcus sp. PCC 7002, a cyanobacterium that is a model organism for studies of photosynthetic carbon fixation and biofuels development. The increased specificity and sensitivity of this approach allowed us to identify many more peptides involved in the processes important for photoautotrophic growth.
international parallel and distributed processing symposium | 2004
Joël M. Malard; Alejandro Heredia-Langner; Douglas J. Baxter; Kristin H. Jarman; William R. Cannon
Summary form only given. Automatic de novo peptide identification from collision-induced dissociation tandem mass spectrometry data is made difficult by large plateaus in the fitness landscapes of scoring functions and the fuzzy nature of the constraints that is due to noise in the data. A framework is presented for combining different peptide identification methods within a parallel genetic algorithm. The distinctive feature of our approach, based on Pareto ranking, is that it can accommodate constraints and possibly conflicting scoring functions. We have also shown how population structure can significantly improve the wall clock time of a parallel peptide identification genetic algorithm while at the same time maintaining some exchange of information across local populations.
Annals of Biomedical Engineering | 2016
Colleen E. Clancy; Gary An; William R. Cannon; Yaling Liu; Elebeoba E. May; P. Ortoleva; Aleksander S. Popel; James P. Sluka; Jing Su; Paolo Vicini; Xiaobo Zhou; David M. Eckmann
A wide range of length and time scales are relevant to pharmacology, especially in drug development, drug design and drug delivery. Therefore, multiscale computational modeling and simulation methods and paradigms that advance the linkage of phenomena occurring at these multiple scales have become increasingly important. Multiscale approaches present in silico opportunities to advance laboratory research to bedside clinical applications in pharmaceuticals research. This is achievable through the capability of modeling to reveal phenomena occurring across multiple spatial and temporal scales, which are not otherwise readily accessible to experimentation. The resultant models, when validated, are capable of making testable predictions to guide drug design and delivery. In this review we describe the goals, methods, and opportunities of multiscale modeling in drug design and development. We demonstrate the impact of multiple scales of modeling in this field. We indicate the common mathematical and computational techniques employed for multiscale modeling approaches used in pharmacometric and systems pharmacology models in drug development and present several examples illustrating the current state-of-the-art models for (1) excitable systems and applications in cardiac disease; (2) stem cell driven complex biosystems; (3) nanoparticle delivery, with applications to angiogenesis and cancer therapy; (4) host-pathogen interactions and their use in metabolic disorders, inflammation and sepsis; and (5) computer-aided design of nanomedical systems. We conclude with a focus on barriers to successful clinical translation of drug development, drug design and drug delivery multiscale models.
PLOS ONE | 2014
William R. Cannon
New methods are needed for large scale modeling of metabolism that predict metabolite levels and characterize the thermodynamics of individual reactions and pathways. Current approaches use either kinetic simulations, which are difficult to extend to large networks of reactions because of the need for rate constants, or flux-based methods, which have a large number of feasible solutions because they are unconstrained by the law of mass action. This report presents an alternative modeling approach based on statistical thermodynamics. The principles of this approach are demonstrated using a simple set of coupled reactions, and then the system is characterized with respect to the changes in energy, entropy, free energy, and entropy production. Finally, the physical and biochemical insights that this approach can provide for metabolism are demonstrated by application to the tricarboxylic acid (TCA) cycle of Escherichia coli. The reaction and pathway thermodynamics are evaluated and predictions are made regarding changes in concentration of TCA cycle intermediates due to 10- and 100-fold changes in the ratio of NAD+:NADH concentrations. Finally, the assumptions and caveats regarding the use of statistical thermodynamics to model non-equilibrium reactions are discussed.
bioinformatics and bioengineering | 2003
Kenneth D. Jarman; William R. Cannon; Kristin H. Jarman; Alejandro Heredia-Langner
We present a model for the probability of random sequences appearing in product ion spectra obtained from tandem mass spectrometry experiments using collision-induced dissociation. We demonstrate the use of these probabilities for ranking candidate peptide sequences obtained using a de novo algorithm. Sequence candidates are obtained from a spectrum graph that is greatly reduced in size from those in previous graph-theoretical de novo approaches. Evidence of multiple instances of subsequences of each candidate, due to different fragment ion type series as well as isotopic peaks, is incorporated in a hierarchical scoring scheme. This approach is shown to be useful for confirming results from database search and as a first step towards a statistically rigorous de novo algorithm.
Frontiers in Bioengineering and Biotechnology | 2014
William R. Cannon
The modeling of the chemical reactions involved in metabolism is a daunting task. Ideally, the modeling of metabolism would use kinetic simulations, but these simulations require knowledge of the thousands of rate constants involved in the reactions. The measurement of rate constants is very labor intensive, and hence rate constants for most enzymatic reactions are not available. Consequently, constraint-based flux modeling has been the method of choice because it does not require the use of the rate constants of the law of mass action. However, this convenience also limits the predictive power of constraint-based approaches in that the law of mass action is used only as a constraint, making it difficult to predict metabolite levels or energy requirements of pathways. An alternative to both of these approaches is to model metabolism using simulations of states rather than simulations of reactions, in which the state is defined as the set of all metabolite counts or concentrations. While kinetic simulations model reactions based on the likelihood of the reaction derived from the law of mass action, states are modeled based on likelihood ratios of mass action. Both approaches provide information on the energy requirements of metabolic reactions and pathways. However, modeling states rather than reactions has the advantage that the parameters needed to model states (chemical potentials) are much easier to determine than the parameters needed to model reactions (rate constants). Herein, we discuss recent results, assumptions, and issues in using simulations of state to model metabolism.