Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Alan R. Dabney is active.

Publication


Featured researches published by Alan R. Dabney.


Bioinformatics | 2006

EDGE: extraction and analysis of differential gene expression

Jeffrey T. Leek; Eva Monsen; Alan R. Dabney; John D. Storey

Summary: EDGE (Extraction of Differential Gene Expression) is an open source, point-and-click software program for the significance analysis of DNA microarray experiments. EDGE can perform both standard and time course differential expression analysis. The functions are based on newly developed statistical theory and methods. This document introduces the EDGE software package. Availability: EDGE is freely available for non-commercial users. EDGE can be downloaded for Windows, Macintosh and Linux/UNIX from http://faculty.washington.edu/jstorey/edge Contact: [email protected]


Bioinformatics | 2009

A statistical framework for protein quantitation in bottom-up MS-based proteomics

Yuliya V. Karpievitch; Jeffrey R. Stanley; Thomas Taverner; Jianhua Huang; Joshua N. Adkins; Charles Ansong; Fred Heffron; Thomas O. Metz; Wei Jun Qian; Hyunjin Yoon; Richard D. Smith; Alan R. Dabney

MOTIVATION Quantitative mass spectrometry-based proteomics requires protein-level estimates and associated confidence measures. Challenges include the presence of low quality or incorrectly identified peptides and informative missingness. Furthermore, models are required for rolling peptide-level information up to the protein level. RESULTS We present a statistical model that carefully accounts for informative missingness in peak intensities and allows unbiased, model-based, protein-level estimation and inference. The model is applicable to both label-based and label-free quantitation experiments. We also provide automated, model-based, algorithms for filtering of proteins and peptides as well as imputation of missing values. Two LC/MS datasets are used to illustrate the methods. In simulation studies, our methods are shown to achieve substantially more discoveries than standard alternatives. AVAILABILITY The software has been made available in the open-source proteomics platform DAnTE (http://omics.pnl.gov/software/).


BMC Bioinformatics | 2012

Normalization and missing value imputation for label-free LC-MS analysis

Yuliya V. Karpievitch; Alan R. Dabney; Richard D. Smith

Shotgun proteomic data are affected by a variety of known and unknown systematic biases as well as high proportions of missing values. Typically, normalization is performed in an attempt to remove systematic biases from the data before statistical inference, sometimes followed by missing value imputation to obtain a complete matrix of intensities. Here we discuss several approaches to normalization and dealing with missing values, some initially developed for microarray data and some developed specifically for mass spectrometry-based data.


Bioinformatics | 2012

DanteR: an extensible R-based tool for quantitative analysis of -omics data.

Thomas Taverner; Yuliya V. Karpievitch; Ashoka D. Polpitiya; Joseph N. Brown; Alan R. Dabney; Gordon A. Anderson; Richard D. Smith

MOTIVATION The size and complex nature of mass spectrometry-based proteomics datasets motivate development of specialized software for statistical data analysis and exploration. We present DanteR, a graphical R package that features extensive statistical and diagnostic functions for quantitative proteomics data analysis, including normalization, imputation, hypothesis testing, interactive visualization and peptide-to-protein rollup. More importantly, users can easily extend the existing functionality by including their own algorithms under the Add-On tab. AVAILABILITY DanteR and its associated user guide are available for download free of charge at http://omics.pnl.gov/software/. We have an updated binary source for the DanteR package up on our website together with a vignettes document. For Windows, a single click automatically installs DanteR along with the R programming environment. For Linux and Mac OS X, users must install R and then follow instructions on the DanteR website for package installation. CONTACT [email protected].


Genome Biology | 2006

A reanalysis of a published Affymetrix GeneChip control dataset

Alan R. Dabney; John D. Storey

A response toPreferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset by SE Choe, M Boutros, AM Michelson, GM Church and MS Halfon. Genome Biology 2005, 6:R16.


Bioinformatics | 2009

Normalization of peak intensities in bottom-up MS-based proteomics using singular value decomposition

Yuliya V. Karpievitch; Thomas Taverner; Joshua N. Adkins; Stephen J. Callister; Gordon A. Anderson; Richard D. Smith; Alan R. Dabney

MOTIVATION LC-MS allows for the identification and quantification of proteins from biological samples. As with any high-throughput technology, systematic biases are often observed in LC-MS data, making normalization an important preprocessing step. Normalization models need to be flexible enough to capture biases of arbitrary complexity, while avoiding overfitting that would invalidate downstream statistical inference. Careful normalization of MS peak intensities would enable greater accuracy and precision in quantitative comparisons of protein abundance levels. RESULTS We propose an algorithm, called EigenMS, that uses singular value decomposition to capture and remove biases from LC-MS peak intensity measurements. EigenMS is an adaptation of the surrogate variable analysis (SVA) algorithm of Leek and Storey, with the adaptations including (i) the handling of the widespread missing measurements that are typical in LC-MS, and (ii) a novel approach to preventing overfitting that facilitates the incorporation of EigenMS into an existing proteomics analysis pipeline. EigenMS is demonstrated using both large-scale calibration measurements and simulations to perform well relative to existing alternatives. AVAILABILITY The software has been made available in the open source proteomics platform DAnTE (Polpitiya et al., 2008)) (http://omics.pnl.gov/software/), as well as in standalone software available at SourceForge (http://sourceforge.net).


Analytical Chemistry | 2011

A statistical method for assessing peptide identification confidence in accurate mass and time tag proteomics.

Jeffrey R. Stanley; Joshua N. Adkins; Gordon W. Slysz; Matthew E. Monroe; Samuel O. Purvine; Yuliya V. Karpievitch; Gordon A. Anderson; Richard D. Smith; Alan R. Dabney

Current algorithms for quantifying peptide identification confidence in the accurate mass and time (AMT) tag approach assume that the AMT tags themselves have been correctly identified. However, there is uncertainty in the identification of AMT tags, because this is based on matching LC-MS/MS fragmentation spectra to peptide sequences. In this paper, we incorporate confidence measures for the AMT tag identifications into the calculation of probabilities for correct matches to an AMT tag database, resulting in a more accurate overall measure of identification confidence for the AMT tag approach. The method is referenced as Statistical Tools for AMT Tag Confidence (STAC). STAC additionally provides a uniqueness probability (UP) to help distinguish between multiple matches to an AMT tag and a method to calculate an overall false discovery rate (FDR). STAC is freely available for download, as both a command line and a Windows graphical application.


Applied and Environmental Microbiology | 2013

Diet Complexity and Estrogen Receptor β Status Affect the Composition of the Murine Intestinal Microbiota

Rani Menon; Sara E. Watson; Laura N. Thomas; Clinton D. Allred; Alan R. Dabney; M. Andrea Azcarate-Peril; Joseph M. Sturino

ABSTRACT Intestinal microbial dysbiosis contributes to the dysmetabolism of luminal factors, including steroid hormones (sterones) that affect the development of chronic gastrointestinal inflammation and the incidence of sterone-responsive cancers of the breast, prostate, and colon. Little is known, however, about the role of specific host sterone nucleoreceptors, including estrogen receptor β (ERβ), in microbiota maintenance. Herein, we test the hypothesis that ERβ status affects microbiota composition and determine if such compositionally distinct microbiota respond differently to changes in diet complexity that favor Proteobacteria enrichment. To this end, conventionally raised female ERβ+/+ and ERβ−/− C57BL/6J mice (mean age of 27 weeks) were initially reared on 8604, a complex diet containing estrogenic isoflavones, and then fed AIN-76, an isoflavone-free semisynthetic diet, for 2 weeks. 16S rRNA gene surveys revealed that the fecal microbiota of 8604-fed mice and AIN-76-fed mice differed, as expected. The relative diversity of Proteobacteria, especially the Alphaproteobacteria and Gammaproteobacteria, increased significantly following the transition to AIN-76. Distinct patterns for beneficial Lactobacillales were exclusive to and highly abundant among 8604-fed mice, whereas several Proteobacteria were exclusive to AIN-76-fed mice. Interestingly, representative orders of the phyla Proteobacteria, Bacteroidetes, and Firmicutes, including the Lactobacillales, also differed as a function of murine ERβ status. Overall, these interactions suggest that sterone nucleoreceptor status and diet complexity may play important roles in microbiota maintenance. Furthermore, we envision that this model for gastrointestinal dysbiosis may be used to identify novel probiotics, prebiotics, nutritional strategies, and pharmaceuticals for the prevention and resolution of Proteobacteria-rich dysbiosis.


PLOS ONE | 2009

An Introspective Comparison of Random Forest-Based Classifiers for the Analysis of Cluster-Correlated Data by Way of RF++

Yuliya V. Karpievitch; Elizabeth G. Hill; Anthony P. Leclerc; Alan R. Dabney; Jonas S. Almeida

Many mass spectrometry-based studies, as well as other biological experiments produce cluster-correlated data. Failure to account for correlation among observations may result in a classification algorithm overfitting the training data and producing overoptimistic estimated error rates and may make subsequent classifications unreliable. Current common practice for dealing with replicated data is to average each subject replicate sample set, reducing the dataset size and incurring loss of information. In this manuscript we compare three approaches to dealing with cluster-correlated data: unmodified Breimans Random Forest (URF), forest grown using subject-level averages (SLA), and RF++ with subject-level bootstrapping (SLB). RF++, a novel Random Forest-based algorithm implemented in C++, handles cluster-correlated data through a modification of the original resampling algorithm and accommodates subject-level classification. Subject-level bootstrapping is an alternative sampling method that obviates the need to average or otherwise reduce each set of replicates to a single independent sample. Our experiments show nearly identical median classification and variable selection accuracy for SLB forests and URF forests when applied to both simulated and real datasets. However, the run-time estimated error rate was severely underestimated for URF forests. Predictably, SLA forests were found to be more severely affected by the reduction in sample size which led to poorer classification and variable selection accuracy. Perhaps most importantly our results suggest that it is reasonable to utilize URF for the analysis of cluster-correlated data. Two caveats should be noted: first, correct classification error rates must be obtained using a separate test dataset, and second, an additional post-processing step is required to obtain subject-level classifications. RF++ is shown to be an effective alternative for classifying both clustered and non-clustered data. Source code and stand-alone compiled versions of command-line and easy-to-use graphical user interface (GUI) versions of RF++ for Windows and Linux as well as a user manual (Supplementary File S2) are available for download at: http://sourceforge.org/projects/rfpp/ under the GNU public license.


Statistical Methods in Medical Research | 2005

Issues in the mapping of two diseases.

Alan R. Dabney; Jon Wakefield

Recently, there has been increased interest in the geographical modelling of two or more diseases. In this article, we consider a number of issues relating to such an endeavour including the standardization process and the comparison of univariate and bivariate disease mapping models. A principle motivation for the examination of two or more diseases is to discover similarities or dissimilarities in the geographical distribution of risk. In this article, we propose a proportional mortality approach to give clues to areas of similarity and dissimilarity. A secondary aim of bivariate modelling is to ‘borrow strength’ between diseases in order to provide better estimates of risk in each area. We will illustrate various modelling strategies using incidence data from 1996 to 2000 on lung and bladder cancer in Washington state.

Collaboration


Dive into the Alan R. Dabney's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Richard D. Smith

Pacific Northwest National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Yuliya V. Karpievitch

University of Western Australia

View shared research outputs
Top Co-Authors

Avatar

Gordon A. Anderson

Pacific Northwest National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Joshua N. Adkins

Pacific Northwest National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Thomas Taverner

Pacific Northwest National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Ashoka D. Polpitiya

Pacific Northwest National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Eva Monsen

University of Washington

View shared research outputs
Top Co-Authors

Avatar

Jeffrey R. Stanley

Pacific Northwest National Laboratory

View shared research outputs
Researchain Logo
Decentralizing Knowledge