Amanda Clare | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Amanda Clare is active.

Explore More

Publication

Featured researches published by Amanda Clare.

european conference on principles of data mining and knowledge discovery | 2001

Knowledge Discovery in Multi-label Phenotype Data

Amanda Clare; Ross D. King

The biological sciences are undergoing an explosion in the amount of available data. New data analysis methods are needed to deal with the data. We present work using KDD to analyse data from mutant phenotype growth experiments with the yeast S. cerevisiae to predict novel gene functions. The analysis of the data presented a number of challenges: multi-class labels, a large number of sparsely populated classes, the need to learn a set of accurate rules (not a complete classification), and a very large amount of missing values. We developed resampling strategies and modified the algorithm C4.5 to deal with these problems. Rules were learnt which are accurate and biologically meaningful. The rules predict function of 83 putative genes of currently unknown function at an estimated accuracy of ≥ 80%.

Science | 2009

The Automation of Science

Ross D. King; Jeremy John Rowland; Stephen G. Oliver; Michael Young; Wayne Aubrey; Emma Louise Byrne; Maria Liakata; Magdalena Markham; Pınar Pir; Larisa N. Soldatova; Andrew Sparkes; Kenneth Edward Whelan; Amanda Clare

The basis of science is the hypothetico-deductive method and the recording of experiments in sufficient detail to enable reproducibility. We report the development of Robot Scientist “Adam,” which advances the automation of both. Adam has autonomously generated functional genomics hypotheses about the yeast Saccharomyces cerevisiae and experimentally tested these hypotheses by using laboratory automation. We have confirmed Adams conclusions through manual experiments. To describe Adams research, we have developed an ontology and logical language. The resulting formalization involves over 10,000 different research units in a nested treelike structure, 10 levels deep, that relates the 6.6 million biomass measurements to their logical description. This formalization describes how a machine contributed to scientific knowledge.

european conference on principles of data mining and knowledge discovery | 2006

Decision trees for hierarchical multilabel classification: a case study in functional genomics

Hendrik Blockeel; Leander Schietgat; Jan Struyf; Sašo Džeroski; Amanda Clare

Hierarchical multilabel classification (HMC) is a variant of classification where instances may belong to multiple classes organized in a hierarchy. The task is relevant for several application domains. This paper presents an empirical study of decision tree approaches to HMC in the area of functional genomics. We compare learning a single HMC tree (which makes predictions for all classes together) to learning a set of regular classification trees (one for each class). Interestingly, on all 12 datasets we use, the HMC tree wins on all fronts: it is faster to learn and to apply, easier to interpret, and has similar or better predictive performance than the set of regular trees. It turns out that HMC tree learning is more robust to overfitting than regular tree learning.

Bioinformatics | 2001

The utility of different representations of protein sequence for predicting functional class

Ross D. King; Andreas Karwath; Amanda Clare; Luc Dehaspe

MOTIVATION Data Mining Prediction (DMP) is a novel approach to predicting protein functional class from sequence. DMP works even in the absence of a homologous protein of known function. We investigate the utility of different ways of representing protein sequence in DMP (residue frequencies, phylogeny, predicted structure) using the Escherichia coli genome as a model. RESULTS Using the different representations DMP learnt prediction rules that were more accurate than default at every level of function using every type of representation. The most effective way to represent sequence was using phylogeny (75% accuracy and 13% coverage of unassigned ORFs at the most general level of function: 69% accuracy and 7% coverage at the most detailed). We tested different methods for combining predictions from the different types of representation. These improved both the accuracy and coverage of predictions, e.g. 40% of all unassigned ORFs could be predicted at an estimated accuracy of 60% and 5% of unassigned ORFs could be predicted at an estimated accuracy of 86%.

intelligent systems in molecular biology | 2006

An ontology for a Robot Scientist

Larisa N. Soldatova; Amanda Clare; Andrew Charles Sparkes; Ross D. King

MOTIVATION A Robot Scientist is a physically implemented robotic system that can automatically carry out cycles of scientific experimentation. We are commissioning a new Robot Scientist designed to investigate gene function in S. cerevisiae. This Robot Scientist will be capable of initiating >1,000 experiments, and making >200,000 observations a day. Robot Scientists provide a unique test bed for the development of methodologies for the curation and annotation of scientific experiments: because the experiments are conceived and executed automatically by computer, it is possible to completely capture and digitally curate all aspects of the scientific process. This new ability brings with it significant technical challenges. To meet these we apply an ontology driven approach to the representation of all the Robot Scientists data and metadata. RESULTS We demonstrate the utility of developing an ontology for our new Robot Scientist. This ontology is based on a general ontology of experiments. The ontology aids the curation and annotating of the experimental data and metadata, and the equipment metadata, and supports the design of database systems to hold the data and metadata. AVAILABILITY EXPO in XML and OWL formats is at: http://sourceforge.net/projects/expo/. All materials about the Robot Scientist project are available at: http://www.aber.ac.uk/compsci/Research/bio/robotsci/.

Bioinformatics | 2002

Machine learning of functional class from phenotype data.

Amanda Clare; Ross D. King

MOTIVATION Mutant phenotype growth experiments are an important novel source of functional genomics data which have received little attention in bioinformatics. We applied supervised machine learning to the problem of using phenotype data to predict the functional class of Open Reading Frames (ORFs) in Saccaromyces cerevisiae. Three sources of data were used: TRansposon-Insertion Phenotypes, Localization and Expression in Saccharomyces (TRIPLES), European Functional Analysis Network (EUROFAN) and Munich Information Center for Protein Sequences (MIPS). The analysis of the data presented a number of challenges to machine learning: multi-class labels, a large number of sparsely populated classes, the need to learn a set of accurate rules (not a complete classification), and a very large amount of missing values. We modified the algorithm C4.5 to deal with these problems. RESULTS Rules were learnt which are accurate and biologically meaningful. The rules predict function of 83 ORFs of unknown function at an estimated accuracy of > or = 80%.

Automated Experimentation | 2010

Towards Robot Scientists for autonomous scientific discovery

Andrew Charles Sparkes; Wayne Aubrey; Emma Louise Byrne; Amanda Clare; Muhammed N Khan; Maria Liakata; Magdalena Markham; Jem J. Rowland; Larisa N. Soldatova; Kenneth Edward Whelan; Michael Young; Ross D. King

We review the main components of autonomous scientific discovery, and how they lead to the concept of a Robot Scientist. This is a system which uses techniques from artificial intelligence to automate all aspects of the scientific discovery process: it generates hypotheses from a computer model of the domain, designs experiments to test these hypotheses, runs the physical experiments using robotic systems, analyses and interprets the resulting data, and repeats the cycle. We describe our two prototype Robot Scientists: Adam and Eve. Adam has recently proven the potential of such systems by identifying twelve genes responsible for catalysing specific reactions in the metabolic pathways of the yeast Saccharomyces cerevisiae. This work has been formally recorded in great detail using logic. We argue that the reporting of science needs to become fully formalised and that Robot Scientists can help achieve this. This will make scientific information more reproducible and reusable, and promote the integration of computers in scientific reasoning. We believe the greater automation of both the physical and intellectual aspects of scientific investigations to be essential to the future of science. Greater automation improves the accuracy and reliability of experiments, increases the pace of discovery and, in common with conventional laboratory automation, removes tedious and repetitive tasks from the human scientist.

knowledge discovery and data mining | 2000

Genome scale prediction of protein functional class from sequence using data mining

Ross D. King; Andreas Karwath; Amanda Clare; Luc Dephaspe

The ability to predict protein function from amino acid sequence is a central research goal of molecular biology. Such a capability would greatly aid the biological interpretation of the genomic data and accelerate its medical exploitation. For the existing sequenced genomes function can be assigned to typically only between 40-60% of the genes [4,8,12,7]. The new science of functional genomics is dedicated to discovering the function of these genes, and to further detailing gene function [10,27,17,6]. Here we present a novel data-mining [24,18] approach to predicting protein functional class from sequence. We demonstrate the effectiveness of this approach on the Mycobacterium tuberculosis [8] genome. Biologically interpretable rules are identified that can predict protein function even in the absence of identifiable sequence homology. These rules predict 65% of the genes with no previous assigned function in Mycobacterium tuberculosis (the bacteria which causes TB) with an estimated accuracy of 60-80% (depending on the level of functional assignment). The rules give insight into the evolutionary history of the organism. Categories and Subject Database Applications, Learning, Life and Medical Sciences General Terms Data mining, Concept learning, Biology and genetics.

Yeast | 2000

Accurate prediction of protein functional class from sequence in the Mycobacterium tuberculosis and Escherichia coli genomes using data mining.

Ross D. King; Andreas Karwath; Amanda Clare; Luc Dehaspe

The analysis of genomics data needs to become as automated as its generation. Here we present a novel data‐mining approach to predicting protein functional class from sequence. This method is based on a combination of inductive logic programming clustering and rule learning. We demonstrate the effectiveness of this approach on the M. tuberculosis and E. coli genomes, and identify biologically interpretable rules which predict protein functional class from information only available from the sequence. These rules predict 65% of the ORFs with no assigned function in M. tuberculosis and 24% of those in E. coli, with an estimated accuracy of 60–80% (depending on the level of functional assignment). The rules are founded on a combination of detection of remote homology, convergent evolution and horizontal gene transfer. We identify rules that predict protein functional class even in the absence of detectable sequence or structural homology. These rules give insight into the evolutionary history of M. tuberculosis and E. coli. Copyright

intelligent systems in molecular biology | 2008

The EXACT description of biomedical protocols

Larisa N. Soldatova; Wayne Aubrey; Ross D. King; Amanda Clare

Motivation: Many published manuscripts contain experiment protocols which are poorly described or deficient in information. This means that the published results are very hard or impossible to repeat. This problem is being made worse by the increasing complexity of high-throughput/automated methods. There is therefore a growing need to represent experiment protocols in an efficient and unambiguous way. Results: We have developed the Experiment ACTions (EXACT) ontology as the basis of a method of representing biological laboratory protocols. We provide example protocols that have been formalized using EXACT, and demonstrate the advantages and opportunities created by using this formalization. We argue that the use of EXACT will result in the publication of protocols with increased clarity and usefulness to the scientific community. Availability: The ontology, examples and code can be downloaded from http://www.aber.ac.uk/compsci/Research/bio/dss/EXACT/ Contact: Larisa Soldatova [email protected]

Explore More