Michael Matthews | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michael Matthews is active.

Explore More

Publication

Featured researches published by Michael Matthews.

pacific symposium on biocomputing | 2007

Assisted Curation: Does Text Mining Really Help?

Beatrice Alex; Claire Grover; Barry Haddow; Mijail Kabadjov; Ewan Klein; Michael Matthews; Stuart Roebuck; Richard Tobin; Xinglong Wang

Although text mining shows considerable promise as a tool for supporting the curation of biomedical text, there is little concrete evidence as to its effectiveness. We report on three experiments measuring the extent to which curation can be speeded up with assistance from Natural Language Processing (NLP), together with subjective feedback from curators on the usability of a curation tool that integrates NLP hypotheses for protein-protein interactions (PPIs). In our curation scenario, we found that a maximum speed-up of 1/3 in curation time can be expected if NLP output is perfectly accurate. The preference of one curator for consistent NLP output and output with high recall needs to be confirmed in a larger study with several curators.

BMC Bioinformatics | 2008

Distinguishing the species of biomedical named entities for term identification

Xinglong Wang; Michael Matthews

BackgroundTerm identification is the task of grounding ambiguous mentions of biomedical named entities in text to unique database identifiers. Previous work on term identification has focused on studying species-specific documents. However, full-length articles often describe entities across a number of species, in which case resolving the ambiguity of model organisms in entities is critical to achieving accurate term identification.ResultsWe developed and compared a number of rule-based and machine-learning based approaches to resolving species ambiguity in mentions of biomedical named entities, and demonstrated that a hybrid method achieved the best overall accuracy at 71.7%, as tested on the gold-standard ITI-TXM corpora. By utilising the species information predicted by the hybrid tagger, our rule-based term identification system was improved significantly by up to 11.6%.ConclusionThis paper shows that, in the context of identifying terms involving multiple model organisms, integration of an accurate species disambiguation system can significantly improve the performance of term identification systems.

Genome Biology | 2008

Automating curation using a natural language processing pipeline

Beatrice Alex; Claire Grover; Barry Haddow; Mijail Kabadjov; Ewan Klein; Michael Matthews; Richard Tobin; Xinglong Wang

Background:The tasks in BioCreative II were designed to approximate some of the laborious work involved in curating biomedical research papers. The approach to these tasks taken by the University of Edinburgh team was to adapt and extend the existing natural language processing (NLP) system that we have developed as part of a commercial curation assistant. Although this paper concentrates on using NLP to assist with curation, the system can be equally employed to extract types of information from the literature that is immediately relevant to biologists in general.Results:Our system was among the highest performing on the interaction subtasks, and competitive performance on the gene mention task was achieved with minimal development effort. For the gene normalization task, a string matching technique that can be quickly applied to new domains was shown to perform close to average.Conclusion:The technologies being developed were shown to be readily adapted to the BioCreative II tasks. Although high performance may be obtained on individual tasks such as gene mention recognition and normalization, and document classification, tasks in which a number of components must be combined, such as detection and normalization of interacting protein pairs, are still challenging for NLP systems.

meeting of the association for computational linguistics | 2007

The Extraction of Enriched Protein-Protein Interactions from Biomedical Text

Barry Haddow; Michael Matthews

There has been much recent interest in the extraction of PPIs (protein-protein interactions) from biomedical texts, but in order to assist with curation efforts, the PPIs must be enriched with further information of biological interest. This paper describes the implementation of a system to extract and enrich PPIs, developed and tested using an annotated corpus of biomedical texts, and employing both machine-learning and rule-based techniques.

pacific symposium on biocomputing | 2007

Comparing usability of matching techniques for normalising biomedical named entities.

Xinglong Wang; Michael Matthews

String matching plays an important role in biomedical Term Normalisation, the task of linking mentions of biomedical entities to identifiers in reference databases. This paper evaluates exact, rule-based and various string-similarity-based matching techniques. The matchers are compared in two ways: first, we measure precision and recall against a gold-standard dataset and second, we integrate the matchers into a curation tool and measure gains in curation speed when they were used to assist a curator in normalising protein and tissue entities. The evaluation shows that a rule-based matcher works better on the gold-standard data, while a string-similarity based system and exact string matcher win out on improving curation efficiency.

Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing | 2008

Species Disambiguation for Biomedical Term Identification

Xinglong Wang; Michael Matthews

An important task in information extraction (IE) from biomedical articles is term identification (TI), which concerns linking entity mentions (e.g., terms denoting proteins) in text to unambiguous identifiers in standard databases (e.g., RefSeq). Previous work on TI has focused on species-specific documents. However, biomedical documents, especially full-length articles, often talk about entities across a number of species, in which case resolving species ambiguity becomes an indispensable part of ti. This paper describes our rule-based and machine-learning based approaches to species disambiguation and demonstrates that performance of TI can be improved by over 20% if the correct species are known. We also show that using the species predicted by the automatic species taggers can improve TI by a large margin.

QJM: An International Journal of Medicine | 1959