Benjamin J. Stapley | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Benjamin J. Stapley is active.

Explore More

Publication

Featured researches published by Benjamin J. Stapley.

Current Medicinal Chemistry | 2004

Prediction of Protein Function in the Absence of Significant Sequence Similarity

Paul D. Dobson; Yu-Dong Cai; Benjamin J. Stapley; Andrew J. Doig

Tremendous progress in DNA sequencing has yielded the genomes of a host of important organisms. The utilisation of these resources requires understanding of the function of each gene. Standard methods of functional assignment involve sequence alignment to a gene of known function; however such methods often fail to find any significant matches. Here we discuss a number of recent alternative methods that may be of use when sequence alignment fails. Function can be defined in a number of ways including E.C. number and MIPS and KEGG functional classes. Phylogenetic profiles show the pattern of presence or absence of a protein between genomes. Protein-protein interactions can be identified by searching for interacting pairs of proteins that are fused to a single protein chain in another organism. The gene neighbour method uses the observation that if the genes that encode two proteins are close on a chromosome, the proteins tend to be functionally related. More general methods use sequence properties such as amino acid composition, mean hydrophobicity, predicted secondary structure and post-translational modification sites. Data mining methods devise rules in the form of IF... THEN statements that make predictions of function using sequence based attributes, predicted secondary structure and sequence similarity. Finally, structural features can be used, after modelling the structure of a protein from its sequence or solving its structure. Protein fold class can be strongly indicative of function, while other structural features, such as secondary structure content, cleft size and 3D structural motifs are also useful.

BMC Bioinformatics | 2005

Mining protein function from text using term-based support vector machines.

Simon B. Rice; Goran Nenadic; Benjamin J. Stapley

BackgroundText mining has spurred huge interest in the domain of biology. The goal of the BioCreAtIvE exercise was to evaluate the performance of current text mining systems. We participated in Task 2, which addressed assigning Gene Ontology terms to human proteins and selecting relevant evidence from full-text documents. We approached it as a modified form of the document classification task. We used a supervised machine-learning approach (based on support vector machines) to assign protein function and select passages that support the assignments. As classification features, we used a proteins co-occurring terms that were automatically extracted from documents.ResultsThe results evaluated by curators were modest, and quite variable for different problems: in many cases we have relatively good assignment of GO terms to proteins, but the selected supporting text was typically non-relevant (precision spanning from 3% to 50%). The method appears to work best when a substantial set of relevant documents is obtained, while it works poorly on single documents and/or short passages. The initial results suggest that our approach can also mine annotations from text even when an explicit statement relating a protein to a GO term is absent.ConclusionA machine learning approach to mining protein function predictions from text can yield good performance only if sufficient training data is available, and significant amount of supporting data is used for prediction. The most promising results are for combined document retrieval and GO term assignment, which calls for the integration of methods developed in BioCreAtIvE Task 1 and Task 2.

meeting of the association for computational linguistics | 2003

Selecting Text Features for Gene Name Classification: from Documents to Terms

Goran Nenadic; Simon B. Rice; Irena Spasic; Sophia Ananiadou; Benjamin J. Stapley

In this paper we discuss the performance of a text-based classification approach by comparing different types of features. We consider the automatic classification of gene names from the molecular biology literature, by using a support-vector machine method. Classification features range from words, lemmas and stems, to automatically extracted terms. Also, simple co-occurrences of genes within documents are considered. The preliminary experiments performed on a set of 3,000 S. cerevisiae gene names and 53,000 Medline abstracts have shown that using domain-specific terms can improve the performance compared to the standard bag-of-words approach, in particular for genes classified with higher confidence, and for under-represented classes.

Protein Science | 2008