Peter C. Andrews
Dartmouth College
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Peter C. Andrews.
PLOS ONE | 2013
Colin P. De Souza; Shahr B. Hashmi; Aysha H. Osmani; Peter C. Andrews; Carol S. Ringelberg; Jay C. Dunlap; Stephen A. Osmani
The filamentous fungi are an ecologically important group of organisms which also have important industrial applications but devastating effects as pathogens and agents of food spoilage. Protein kinases have been implicated in the regulation of virtually all biological processes but how they regulate filamentous fungal specific processes is not understood. The filamentous fungus Aspergillus nidulans has long been utilized as a powerful molecular genetic system and recent technical advances have made systematic approaches to study large gene sets possible. To enhance A. nidulans functional genomics we have created gene deletion constructs for 9851 genes representing 93.3% of the encoding genome. To illustrate the utility of these constructs, and advance the understanding of fungal kinases, we have systematically generated deletion strains for 128 A. nidulans kinases including expanded groups of 15 histidine kinases, 7 SRPK (serine-arginine protein kinases) kinases and an interesting group of 11 filamentous fungal specific kinases. We defined the terminal phenotype of 23 of the 25 essential kinases by heterokaryon rescue and identified phenotypes for 43 of the 103 non-essential kinases. Uncovered phenotypes ranged from almost no growth for a small number of essential kinases implicated in processes such as ribosomal biosynthesis, to conditional defects in response to cellular stresses. The data provide experimental evidence that previously uncharacterized kinases function in the septation initiation network, the cell wall integrity and the morphogenesis Orb6 kinase signaling pathways, as well as in pathways regulating vesicular trafficking, sexual development and secondary metabolism. Finally, we identify ChkC as a third effector kinase functioning in the cellular response to genotoxic stress. The identification of many previously unknown functions for kinases through the functional analysis of the A. nidulans kinome illustrates the utility of the A. nidulans gene deletion constructs.
Annals of Human Genetics | 2011
Jiang Gui; Angeline S. Andrew; Peter C. Andrews; Heather M. Nelson; Karl T. Kelsey; Margaret R. Karagas; Jason H. Moore
A central goal of human genetics is to identify susceptibility genes for common human diseases. An important challenge is modelling gene–gene interaction or epistasis that can result in nonadditivity of genetic effects. The multifactor dimensionality reduction (MDR) method was developed as a machine learning alternative to parametric logistic regression for detecting interactions in the absence of significant marginal effects. The goal of MDR is to reduce the dimensionality inherent in modelling combinations of polymorphisms using a computational approach called constructive induction. Here, we propose a Robust Multifactor Dimensionality Reduction (RMDR) method that performs constructive induction using a Fishers Exact Test rather than a predetermined threshold. The advantage of this approach is that only statistically significant genotype combinations are considered in the MDR analysis. We use simulation studies to demonstrate that this approach will increase the success rate of MDR when there are only a few genotype combinations that are significantly associated with case‐control status. We show that there is no loss of success rate when this is not the case. We then apply the RMDR method to the detection of gene–gene interactions in genotype data from a population‐based study of bladder cancer in New Hampshire.
PLOS ONE | 2013
Jiang Gui; Jason H. Moore; Scott M. Williams; Peter C. Andrews; Hans L. Hillege; Pim van der Harst; Gerjan Navis; Wiek H. van Gilst; Folkert W. Asselbergs; Diane Gilbert-Diamond
We present an extension of the two-class multifactor dimensionality reduction (MDR) algorithm that enables detection and characterization of epistatic SNP-SNP interactions in the context of a quantitative trait. The proposed Quantitative MDR (QMDR) method handles continuous data by modifying MDR’s constructive induction algorithm to use a T-test. QMDR replaces the balanced accuracy metric with a T-test statistic as the score to determine the best interaction model. We used a simulation to identify the empirical distribution of QMDR’s testing score. We then applied QMDR to genetic data from the ongoing prospective Prevention of Renal and Vascular End-Stage Disease (PREVEND) study.
Human Heredity | 2010
Jiang Gui; Angeline S. Andrew; Peter C. Andrews; Heather M. Nelson; Karl T. Kelsey; Margaret R. Karagas; Jason H. Moore
Epistasis or gene-gene interaction is a fundamental component of the genetic architecture of complex traits such as disease susceptibility. Multifactor dimensionality reduction (MDR) was developed as a nonparametric and model-free method to detect epistasis when there are no significant marginal genetic effects. However, in many studies of complex disease, other covariates like age of onset and smoking status could have a strong main effect and may potentially interfere with MDR’s ability to achieve its goal. In this paper, we present a simple and computationally efficient sampling method to adjust for covariate effects in MDR. We use simulation to show that after adjustment, MDR has sufficient power to detect true gene-gene interactions. We also compare our method with the state-of-art technique in covariate adjustment. The results suggest that our proposed method performs similarly, but is more computationally efficient. We then apply this new method to an analysis of a population-based bladder cancer study in New Hampshire.
RNA | 2009
Heidi W. Trask; Richard Cowper-Sallari; Maureen A. Sartor; Jiang Gui; Catherine V. Heath; Janhavi Renuka; Azara Jane Higgins; Peter C. Andrews; Murray Korc; Jason H. Moore; Craig R. Tomlinson
With no known exceptions, every published microarray study to determine differential mRNA levels in eukaryotes used RNA extracted from whole cells. It is assumed that the use of whole cell RNA in microarray gene expression analysis provides a legitimate profile of steady-state mRNA. Standard labeling methods and the prevailing dogma that mRNA resides almost exclusively in the cytoplasm has led to the long-standing belief that the nuclear RNA contribution is negligible. We report that unadulterated cytoplasmic RNA uncovers differentially expressed mRNAs that otherwise would not have been detected when using whole cell RNA and that the inclusion of nuclear RNA has a large impact on whole cell gene expression microarray results by distorting the mRNA profile to the extent that a substantial number of false positives are generated. We conclude that to produce a valid profile of the steady-state mRNA population, the nuclear component must be excluded, and to arrive at a more realistic view of a cells gene expression profile, the nuclear and cytoplasmic RNA fractions should be analyzed separately.
european conference on applications of evolutionary computation | 2016
Randal S. Olson; Ryan J. Urbanowicz; Peter C. Andrews; Nicole A. Lavender; La Creis R. Kidd; Jason H. Moore
Over the past decade, data science and machine learning has grown from a mysterious art form to a staple tool across a variety of fields in academia, business, and government. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning—pipeline design. We implement a Tree-based Pipeline Optimization Tool (TPOT) and demonstrate its effectiveness on a series of simulated and real-world genetic data sets. In particular, we show that TPOT can build machine learning pipelines that achieve competitive classification accuracy and discover novel pipeline operators—such as synthetic feature constructors—that significantly improve classification accuracy on these data sets. We also highlight the current challenges to pipeline optimization, such as the tendency to produce pipelines that overfit the data, and suggest future research paths to overcome these challenges. As such, this work represents an early step toward fully automating machine learning pipeline design.
Methods of Molecular Biology | 2015
Jason H. Moore; Peter C. Andrews
Here we introduce the multifactor dimensionality reduction (MDR) methodology and software package for detecting and characterizing epistasis in genetic association studies. We provide a general overview of the method and then highlight some of the key functions of the open-source MDR software package that is freely distributed. We end with a few examples of published studies of complex human diseases that have used MDR.
evolutionary computation machine learning and data mining in bioinformatics | 2009
Casey S. Greene; Jason M. Gilmore; Jeff Kiralis; Peter C. Andrews; Jason H. Moore
The availability of chip-based technology has transformed human genetics and made routine the measurement of thousands of DNA sequence variations giving rise to an informatics challenge. This challenge is the identification of combinations of interacting DNA sequence variations predictive of common diseases. We have previously developed Multifactor Dimensionality Reduction (MDR), a method capable of detecting these interactions, but an exhaustive MDR analysis is exponential in time complexity and thus unsuitable for an interaction analysis of genome-wide datasets. Therefore we look to stochastic search approaches to find a suitable wrapper for the analysis of these data. We have previously shown that an ant colony optimization (ACO) framework can be successfully applied to human genetics when expert knowledge is included. We have integrated an ACO stochastic search wrapper into the open source MDR software package. In this wrapper we also introduce a scaling method based on an exponential distribution function with a single user-adjustable parameter. Here we obtain expert knowledge from Tuned ReliefF (TuRF), a method capable of detecting attribute interactions in the absence of main effects, and perform a power analysis at different parameter settings. We show that the expert knowledge distribution parameter, the retention factor, and the weighting of expert knowledge significantly affect the power of the method.
Biodata Mining | 2012
Nora Chung Kim; Peter C. Andrews; Folkert W. Asselbergs; H. Robert Frost; Scott M. Williams; Brent T. Harris; Cynthia Read; Kathleen D. Askland; Jason H. Moore
BackgroundIt is increasingly clear that common human diseases have a complex genetic architecture characterized by both additive and nonadditive genetic effects. The goal of the present study was to determine whether patterns of both additive and nonadditive genetic associations aggregate in specific functional groups as defined by the Gene Ontology (GO).ResultsWe first estimated all pairwise additive and nonadditive genetic effects using the multifactor dimensionality reduction (MDR) method that makes few assumptions about the underlying genetic model. Statistical significance was evaluated using permutation testing in two genome-wide association studies of ALS. The detection data consisted of 276 subjects with ALS and 271 healthy controls while the replication data consisted of 221 subjects with ALS and 211 healthy controls. Both studies included genotypes from approximately 550,000 single-nucleotide polymorphisms (SNPs). Each SNP was mapped to a gene if it was within 500 kb of the start or end. Each SNP was assigned a p-value based on its strongest joint effect with the other SNPs. We then used the Exploratory Visual Analysis (EVA) method and software to assign a p-value to each gene based on the overabundance of significant SNPs at the α = 0.05 level in the gene. We also used EVA to assign p-values to each GO group based on the overabundance of significant genes at the α = 0.05 level. A GO category was determined to replicate if that category was significant at the α = 0.05 level in both studies. We found two GO categories that replicated in both studies. The first, ‘Regulation of Cellular Component Organization and Biogenesis’, a GO Biological Process, had p-values of 0.010 and 0.014 in the detection and replication studies, respectively. The second, ‘Actin Cytoskeleton’, a GO Cellular Component, had p-values of 0.040 and 0.046 in the detection and replication studies, respectively.ConclusionsPathway analysis of pairwise genetic associations in two GWAS of sporadic ALS revealed a set of genes involved in cellular component organization and actin cytoskeleton, more specifically, that were not reported by prior GWAS. However, prior biological studies have implicated actin cytoskeleton in ALS and other motor neuron diseases. This study supports the idea that pathway-level analysis of GWAS data may discover important associations not revealed using conventional one-SNP-at-a-time approaches.
Biodata Mining | 2017
Jason H. Moore; Peter C. Andrews; Randal S. Olson; Sarah E. Carlson; Curt R. Larock; Mario J. Bulhoes; James P. O’Connor; Ellen M. Greytak; Steven L. Armentrout
BackgroundLarge-scale genetic studies of common human diseases have focused almost exclusively on the independent main effects of single-nucleotide polymorphisms (SNPs) on disease susceptibility. These studies have had some success, but much of the genetic architecture of common disease remains unexplained. Attention is now turning to detecting SNPs that impact disease susceptibility in the context of other genetic factors and environmental exposures. These context-dependent genetic effects can manifest themselves as non-additive interactions, which are more challenging to model using parametric statistical approaches. The dimensionality that results from a multitude of genotype combinations, which results from considering many SNPs simultaneously, renders these approaches underpowered. We previously developed the multifactor dimensionality reduction (MDR) approach as a nonparametric and genetic model-free machine learning alternative. Approaches such as MDR can improve the power to detect gene-gene interactions but are limited in their ability to exhaustively consider SNP combinations in genome-wide association studies (GWAS), due to the combinatorial explosion of the search space. We introduce here a stochastic search algorithm called Crush for the application of MDR to modeling high-order gene-gene interactions in genome-wide data. The Crush-MDR approach uses expert knowledge to guide probabilistic searches within a framework that capitalizes on the use of biological knowledge to filter gene sets prior to analysis. Here we evaluated the ability of Crush-MDR to detect hierarchical sets of interacting SNPs using a biology-based simulation strategy that assumes non-additive interactions within genes and additivity in genetic effects between sets of genes within a biochemical pathway.ResultsWe show that Crush-MDR is able to identify genetic effects at the gene or pathway level significantly better than a baseline random search with the same number of model evaluations. We then applied the same methodology to a GWAS for Alzheimer’s disease and showed base level validation that Crush-MDR was able to identify a set of interacting genes with biological ties to Alzheimer’s disease.ConclusionsWe discuss the role of stochastic search and cloud computing for detecting complex genetic effects in genome-wide data.