Thomas M. Keane | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Thomas M. Keane is active.

Explore More

Publication

Featured researches published by Thomas M. Keane.

BMC Evolutionary Biology | 2006

Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified

Thomas M. Keane; Christopher J. Creevey; Melissa M. Pentony; Thomas J. Naughton; James O Mclnerney

BackgroundIn recent years, model based approaches such as maximum likelihood have become the methods of choice for constructing phylogenies. A number of authors have shown the importance of using adequate substitution models in order to produce accurate phylogenies. In the past, many empirical models of amino acid substitution have been derived using a variety of different methods and protein datasets. These matrices are normally used as surrogates, rather than deriving the maximum likelihood model from the dataset being examined. With few exceptions, selection between alternative matrices has been carried out in an ad hoc manner.ResultsWe start by highlighting the potential dangers of arbitrarily choosing protein models by demonstrating an empirical example where a single alignment can produce two topologically different and strongly supported phylogenies using two different arbitrarily-chosen amino acid substitution models. We demonstrate that in simple simulations, statistical methods of model selection are indeed robust and likely to be useful for protein model selection. We have investigated patterns of amino acid substitution among homologous sequences from the three Domains of life and our results show that no single amino acid matrix is optimal for any of the datasets. Perhaps most interestingly, we demonstrate that for two large datasets derived from the proteobacteria and archaea, one of the most favored models in both datasets is a model that was originally derived from retroviral Pol proteins.ConclusionThis demonstrates that choosing protein models based on their source or method of construction may not be appropriate.

Nucleic Acids Research | 2007

MultiPhyl: a high-throughput phylogenomics webserver using distributed computing

Thomas M. Keane; Thomas J. Naughton; James O. McInerney

With the number of fully sequenced genomes increasing steadily, there is greater interest in performing large-scale phylogenomic analyses from large numbers of individual gene families. Maximum likelihood (ML) has been shown repeatedly to be one of the most accurate methods for phylogenetic construction. Recently, there have been a number of algorithmic improvements in maximum-likelihood-based tree search methods. However, it can still take a long time to analyse the evolutionary history of many gene families using a single computer. Distributed computing refers to a method of combining the computing power of multiple computers in order to perform some larger overall calculation. In this article, we present the first high-throughput implementation of a distributed phylogenetics platform, MultiPhyl, capable of using the idle computational resources of many heterogeneous non-dedicated machines to form a phylogenetics supercomputer. MultiPhyl allows a user to upload hundreds or thousands of amino acid or nucleotide alignments simultaneously and perform computationally intensive tasks such as model selection, tree searching and bootstrapping of each of the alignments using many desktop machines. The program implements a set of 88 amino acid models and 56 nucleotide maximum likelihood models and a variety of statistical methods for choosing between alternative models. A MultiPhyl webserver is available for public use at: http://www.cs.nuim.ie/distributed/multiphyl.php.

Bioinformatics | 2005

DPRml: distributed phylogeny reconstruction by maximum likelihood

Thomas M. Keane; Thomas J. Naughton; Simon A. A. Travers; James O. McInerney; Grace P. McCormack

MOTIVATIONnIn recent years there has been increased interest in producing large and accurate phylogenetic trees using statistical approaches. However for a large number of taxa, it is not feasible to construct large and accurate trees using only a single processor. A number of specialized parallel programs have been produced in an attempt to address the huge computational requirements of maximum likelihood. We express a number of concerns about the current set of parallel phylogenetic programs which are currently severely limiting the widespread availability and use of parallel computing in maximum likelihood-based phylogenetic analysis.nnnRESULTSnWe have identified the suitability of phylogenetic analysis to large-scale heterogeneous distributed computing. We have completed a distributed and fully cross-platform phylogenetic tree building program called distributed phylogeny reconstruction by maximum likelihood. It uses an already proven maximum likelihood-based tree building algorithm and a popular phylogenetic analysis library for all its likelihood calculations. It offers one of the most extensive sets of DNA substitution models currently available. We are the first, to our knowledge, to report the completion of a distributed phylogenetic tree building program that can achieve near-linear speedup while only using the idle clock cycles of machines. For those in an academic or corporate environment with hundreds of idle desktop machines, we have shown how distributed computing can deliver a free ML supercomputer.

international parallel and distributed processing symposium | 2006

Distributed Monte Carlo simulation of light transportation in tissue

Andrew J. Page; Shirley Coyle; Thomas M. Keane; Thomas J. Naughton; Charles Markham; Tomas E. Ward

A distributed Monte Carlo simulation which models the propagation of light through tissue has been developed. It allows for improved calibration of medical imaging devices for investigating tissue oxygenation in the white matter of the cerebral cortex. The application can distribute the simulation over an unbounded number of processors in parallel. We have found that this application is highly parallelisable resulting in up to 91% efficiency at 60 processors running on a homogeneous Java distributed system. A distributed system with 150 heterogeneous processors was used to simulate the paths of photons in a brain tissue model. We found that the source illumination footprint has an effect on the distribution of photons in the head and that lasers do produce a small beam in a highly scattering medium. This application will help researchers to improve the accuracy of their experiments

FIDJI '01 Revised Papers from the International Workshop on Scientific Engineering for Distributed Java Applications | 2002

Distributed Java Platform with Programmable MIMD Capabilities

Thomas M. Keane; Richard Allen; Thomas J. Naughton; James O. McInerney; John Waldron

A distributed Java platform has been designed and built for the simplified implementation of distributed Java applications. Its programmable nature means that code as well as data is distributed over a network. The platform is largely based on the Java Distributed Computation Library of Fritsche, Power, and Waldron. The generality of our system is demonstrated through the emulation of a MIMD (multiple instruction, multiple data) architecture. This is achieved by augmenting the server with a virtual pipeline processor. We explain the design of the system, its deployment over a university network, and its evaluation through a sample application.

Bioinformatics | 2005

DSEARCH: sensitive database searching using distributed computing

Thomas M. Keane; Thomas J. Naughton

UNLABELLEDnWe present a distributed and fully cross-platform database search program that allows the user to utilize the idle clock cycles of machines to perform large searches using the most sensitive algorithms. For those in an academic or corporate environment with hundreds of idle desktop machines, DSEARCH can deliver a free database search supercomputer.nnnAVAILABILITYnThe software is publicly available under the GNU general public licence from http://www.cs.may.ie/[email protected] INFORMATIONnFull documentation and a user manual is available from http://www.cs.may.ie/distributed.

international symposium on parallel and distributed computing | 2004

Adaptive scheduling across a distributed computation platform

Andrew J. Page; Thomas M. Keane; Thomas J. Naughton

A programmable Java distributed system, which adapts to available resources, has been developed to minimise the overall processing time of computationally intensive problems. The system exploits the free resources of a heterogeneous set of computers linked together by a network, communicating using SUN Microsystems Remote Method Invocation and Java sockets. It uses a multi-tiered distributed system model, which in principal allows for a system of unbounded size. The system consists of an n-ary tree of nodes where the internal nodes perform the scheduling and the leaves do the processing. The scheduler nodes communicate in a peer-to-peer manner and the processing nodes operate in a strictly client-server manner with their respective scheduler. The independent schedulers on each tier of the tree dynamically allocate resources between problems based on the constantly changing characteristics of the underlying network. The system has been evaluated over a network of 86 PCs with a bioinformatics application and the travelling salesman optimisation problem.

Algorithmica | 2006

Building Large Phylogenetic Trees on Coarse-Grained Parallel Machines

Thomas M. Keane; Andrew J. Page; Thomas J. Naughton; Simon A. A. Travers; James O. McInerney

AbstractPhylogenetic analysis is an area of computational biology concerned with thenreconstruction of evolutionary relationships between organisms, genes, and gene families. Maximum likelihoodnevaluation has proven to be one of the most reliable methods for constructing phylogenetic trees. The hugencomputational requirements associated with maximum likelihood analysis means that it is not feasible to producenlarge phylogenetic trees using a single processor. We have completed a fully cross platform coarse-grainedndistributed application, DPRml, which overcomes many of the limitations imposed by the current set of parallelnphylogenetic programs. We have completed a set of efficiency tests that show how to maximise efficiency while usingnthe program to build large phylogenetic trees. The software is publicly available under the terms of the GNU generalnpublic licence from the system webpage at http://www.cs.nuim.ie/distributed.

international parallel and distributed processing symposium | 2005

Bioinformatics on a heterogeneous Java distributed system

Andrew J. Page; Thomas M. Keane; Thomas J. Naughton

A programmable Java distributed system, which utilises the free resources of a heterogeneous set of computers linked together by a network, has been developed. The system has been successfully deployed on over 200 computers, which were distributed over a number of locations, and has been successfully used to process bioinformatics, biomedical engineering, and cryptography applications. We present two bioinformatics applications, DSEARCH, which performs sensitive database and DPRml which performs distributed phylogeny reconstruction by maximum likelihood.

computer-based medical systems | 2005

A high-throughput bioinformatics distributed computing platform

Thomas M. Keane; Andrew J. Page; James O. McInerney; Thomas J. Naughton

In the past number of years the demand for high performance computing has greatly increased in the area of bioinformatics. The huge increase in size of many genomic databases has meant that many common tasks in bioinformatics are not possible to complete in a reasonable amount of time on a single processor. Recently distributed computing has emerged as an inexpensive alternative to dedicated parallel computing. We have developed a general-purpose distributed computing platform that is capable of using semi-idle computing resources to simulate a dedicated computing cluster. We have identified the suitability of a number of bioinformatics tasks to distributed computing. We briefly outline and evaluate two distributed bioinformatics programs, DSEARCH and DPRml, which have been developed for our system.

Explore More