Michael Stout | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michael Stout is active.

Explore More

Publication

Featured researches published by Michael Stout.

BMC Bioinformatics | 2009

Automated Alphabet Reduction for Protein Datasets

Jaume Bacardit; Michael Stout; Jonathan D. Hirst; Alfonso Valencia; Robert E. Smith; Natalio Krasnogor

BackgroundWe investigate automated and generic alphabet reduction techniques for protein structure prediction datasets. Reducing alphabet cardinality without losing key biochemical information opens the door to potentially faster machine learning, data mining and optimization applications in structural bioinformatics. Furthermore, reduced but informative alphabets often result in, e.g., more compact and human-friendly classification/clustering rules. In this paper we propose a robust and sophisticated alphabet reduction protocol based on mutual information and state-of-the-art optimization techniques.ResultsWe applied this protocol to the prediction of two protein structural features: contact number and relative solvent accessibility. For both features we generated alphabets of two, three, four and five letters. The five-letter alphabets gave prediction accuracies statistically similar to that obtained using the full amino acid alphabet. Moreover, the automatically designed alphabets were compared against other reduced alphabets taken from the literature or human-designed, outperforming them. The differences between our alphabets and the alphabets taken from the literature were quantitatively analyzed. All the above process had been performed using a primary sequence representation of proteins. As a final experiment, we extrapolated the obtained five-letter alphabet to reduce a, much richer, protein representation based on evolutionary information for the prediction of the same two features. Again, the performance gap between the full representation and the reduced representation was small, showing that the results of our automated alphabet reduction protocol, even if they were obtained using a simple representation, are also able to capture the crucial information needed for state-of-the-art protein representations.ConclusionOur automated alphabet reduction protocol generates competent reduced alphabets tailored specifically for a variety of protein datasets. This process is done without any domain knowledge, using information theory metrics instead. The reduced alphabets contain some unexpected (but sound) groups of amino acids, thus suggesting new ways of interpreting the data.

Nature Methods | 2016

Real-time selective sequencing using nanopore technology

Matthew Loose; Sunir Malla; Michael Stout

The Oxford Nanopore Technologies MinION sequencer enables the selection of specific DNA molecules for sequencing by reversing the driving voltage across individual nanopores. To directly select molecules for sequencing, we used dynamic time warping to match reads to reference sequences. We demonstrate our open-source Read Until software in real-time selective sequencing of regions within small genomes, individual amplicon enrichment and normalization of an amplicon set.

genetic and evolutionary computation conference | 2007

Automated alphabet reduction method with evolutionary algorithms for protein structure prediction

Jaume Bacardit; Michael Stout; Jonathan D. Hirst; Kumara Sastry; Xavier Llorà; Natalio Krasnogor

This paper focuses on automated procedures to reduce the dimensionality ofprotein structure prediction datasets by simplifying the way in which the primary sequence of a protein is represented. The potential benefits ofthis procedure are faster and easier learning process as well as the generationof more compact and human-readable classifiers.The dimensionality reduction procedure we propose consists on the reductionof the 20-letter amino acid (AA) alphabet, which is normally used to specify a protein sequence, into a lower cardinality alphabet. This reduction comes about by a clustering of AA types accordingly to their physical and chemical similarity. Our automated reduction procedure is guided by a fitness function based on the Mutual Information between the AA-based input attributes of the dataset and the protein structure featurethat being predicted. To search for the optimal reduction, the Extended Compact Genetic Algorithm (ECGA) was used, and afterwards the results of this process were fed into (and validated by) BioHEL, a genetics-based machine learningtechnique. BioHEL used the reduced alphabet to induce rules forprotein structure prediction features. BioHEL results are compared to two standard machine learning systems. Our results show that it is possible to reduce the size of the alphabet used for prediction fromtwenty to just three letters resulting in more compact, i.e. interpretable,rules. Also, a protein-wise accuracy performance measure suggests that the loss of accuracy acrued by this substantial alphabet reduction is not statistically significant when compared to the full alphabet.

soft computing | 2008

Prediction of topological contacts in proteins using learning classifier systems

Michael Stout; Jaume Bacardit; Jonathan D. Hirst; Robert E. Smith; Natalio Krasnogor

Evolutionary based data mining techniques are increasingly applied to problems in the bioinformatics domain. We investigate an important aspect of predicting the folded 3D structure of proteins from their unfolded residue sequence using evolutionary based machine learning techniques. Our approach is to predict specific features of residues in folded protein chains, in particular features derived from the Delaunay tessellations, Gabriel graphs and relative neighborhood graphs as well as minimum spanning trees. Several standard machine learning algorithms were compared to a state-of-the-art learning method, a learning classifier system (LCS), that is capable of generating compact and interpretable rule sets. Predictions were performed for various degrees of precision using a range of experimental parameters. Examples of the rules obtained are presented. The LCS produces results with good predictive performance and generates competent yet simple and interpretable classification rules.

genetic and evolutionary computation conference | 2006

Coordination number prediction using learning classifier systems: performance and interpretability

Jaume Bacardit; Michael Stout; Natalio Krasnogor; Jonathan D. Hirst; Jacek Blazewicz

The prediction of the coordination number (CN) of an amino acid in a protein structure has recently received renewed attention. In a recent paper, Kinjo et al. proposed a real-valued definition of CN and a criterion to map it onto a finite set of classes, in order to predict it using classification approaches. The literature reports several kinds of input information used for CN prediction. The aim of this paper is to assess the performance of a state-of-the-art learning method, Learning Classifier Systems (LCS) on this CN definition, with various degrees of precision, based on several combinations of input attributes. Moreover, we will compare the LCS performance to other well-known learning techniques. Our experiments are also intended to determinethe minimum set of input information needed to achieve good predictive performance, so as to generate competent yet simple and interpretable classification rules. Thus, the generated predictors (rule sets) are analyzed for their interpretability.

New Phytologist | 2014

Mechanical modelling quantifies the functional importance of outer tissue layers during root elongation and bending

Rosemary J. Dyson; Gema Vizcay-Barrena; Leah R. Band; Anwesha N. Fernandes; Andrew P. French; John A. Fozard; T. Charlie Hodgman; Kim Kenobi; Tony P. Pridmore; Michael Stout; Darren M. Wells; Michael Wilson; Malcolm J. Bennett; Oliver E. Jensen

Root elongation and bending require the coordinated expansion of multiple cells of different types. These processes are regulated by the action of hormones that can target distinct cell layers. We use a mathematical model to characterise the influence of the biomechanical properties of individual cell walls on the properties of the whole tissue. Taking a simple constitutive model at the cell scale which characterises cell walls via yield and extensibility parameters, we derive the analogous tissue-level model to describe elongation and bending. To accurately parameterise the model, we take detailed measurements of cell turgor, cell geometries and wall thicknesses. The model demonstrates how cell properties and shapes contribute to tissue-level extensibility and yield. Exploiting the highly organised structure of the elongation zone (EZ) of the Arabidopsis root, we quantify the contributions of different cell layers, using the measured parameters. We show how distributions of material and geometric properties across the root cross-section contribute to the generation of curvature, and relate the angle of a gravitropic bend to the magnitude and duration of asymmetric wall softening. We quantify the geometric factors which lead to the predominant contribution of the outer cell files in driving root elongation and bending.

Lecture Notes in Computer Science | 2006

From HP lattice models to real proteins: coordination number prediction using learning classifier systems

Michael Stout; Jaume Bacardit; Jonathan D. Hirst; Natalio Krasnogor; Jacek Blazewicz

Prediction of the coordination number (CN) of residues in proteins based solely on protein sequence has recently received renewed attention. At the same time, simplified protein models such as the HP model have been used to understand protein folding and protein structure prediction. These models represent the sequence of a protein using two residue types: hydrophobic and polar, and restrict the residue locations to those of a lattice. The aim of this paper is to compare CN prediction at three levels of abstraction a) 3D Cubic lattice HP model proteins, b) Real proteins represented by their HP sequence and c) Real proteins using residue sequence alone. For the 3D HP lattice model proteins the CN of each residue is simply the number of neighboring residues on the lattice. For the real proteins, we use a recent real-valued definition of CN proposed by Kinjo et al. To perform the predictions we use GAssist, a recent evolutionary computation based machine learning method belonging to the Learning Classifier System (LCS) family. Its performance was compared against some alternative learning techniques. Predictions using the HP sequence representation with only two residue types were only a little worse than those using a full 20 letter amino acid alphabet (64% vs 68% for two state prediction, 45% vs 50% for three state prediction and 30% vs 33% for five state prediction). That HP sequence information alone can result in predictions accuracies that are within 5% of those obtained using full residue type information indicates that hydrophobicity is a key determinant of CN and further justifies studies of simplified models.

Evolutionary Intelligence | 2010

A learning classifier system with mutual-information-based fitness

Robert E. Smith; Max Kun Jiang; Jaume Bacardit; Michael Stout; Natalio Krasnogor; Jonathan D. Hirst

This paper introduces a new variety of learning classifier system (LCS), called MILCS, which utilizes mutual information as fitness feedback. Unlike most LCSs, MILCS is specifically designed for supervised learning. We present experimental results, and contrast them to results from XCS, UCS, GAssist, BioHEL, C4.5 and Naïve Bayes. We discuss the explanatory power of the resulting rule sets. MILCS is also shown to promote the discovery of default hierarchies, an important advantage of LCSs. Final comments include future directions for this research, including investigations in neural networks and other systems.

Learning Classifier Systems in Data Mining | 2008

Data Mining in Proteomics with Learning Classifier Systems

Jaume Bacardit; Michael Stout; Jonathan D. Hirst; Natalio Krasnogor

The era of data mining has provided renewed effort in the research of certain areas of biology that for their difficulty and lack of knowledge were and are still considered unsolved problems. One such problem, which is one of the fundamental open problems in computational biology is the prediction of the 3D structure of proteins, or protein structure prediction (PSP). The human experts, with the crucial help of data mining tools, are learning how protein fold to form their structure, but are still far from providing perfect models for all kinds of proteins. Data mining and knowledge discovery are totally necessary in order to advance in the understanding of the folding process. In this context, Learning Classifier Systems (LCS) are very competitive tools. They have shown in the past their competence in many different data mining tasks. Moreover, they provide human-readable solutions to the experts that can help them understand the PSP problem. In this chapter we describe our recent efforts in applying LCS to PSP related domains. Specifically, we focus in a relevant PSP subproblem, called Coordination Number (CN) prediction. CN is a kind of simplified profile of the 3D structure of a protein. Two kinds of experiments are described, the first of them analyzing different ways to represent the basic composition of proteins, its primary sequence, and the second one assessing different data sources and problem definition methods for performing competent CN prediction. In all the experiments LCS show their competence in terms of both accurate predictions and explanatory power.

Proceedings of the 7th International FLINS Conference | 2006

PREDICTION OF RESIDUE EXPOSURE AND CONTACT NUMBER FOR SIMPLIFIED HP LATTICE MODEL PROTEINS USING LEARNING CLASSIFIER SYSTEMS

Michael Stout; Jaume Bacardit; Jonathan D. Hirst; Jacek Blazewicz; Natalio Krasnogor

Automated Scheduling, Optimisation and Planning Research Group, School of Computer Science and IT, University of Nottingham, Jubilee Campus, Wollaton Road, Nottingham, NG8 1BB, UK Email: {jqb,mqs,nxk}@cs.nott.ac.uk School of Chemistry, University of Nottingham, University Park, Nottingham NG7 2RD, UK Email: [email protected] Poznan University of Technology, Institute of Computing Science, ul. Piotrowo 3a, 60-965 Poznan, Poland Email: [email protected]

Explore More