Mark Greenwood
University of Manchester
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mark Greenwood.
In: Workflows for e-Science, Scientific Workflows for Grids. Springer-Verlag London Ltd; 2006.. | 2007
Tom Oinn; Peter Li; Douglas B. Kell; Carole A. Goble; Antoon Goderis; Mark Greenwood; Duncan Hull; Robert Stevens; Daniele Turi; Jun Zhao
Bioinformatics is a discipline that uses computational and mathematical techniques to store, manage, and analyze biological data in order to answer biological questions. Bioinformatics has over 850 databases [154] and numerous tools that work over those databases and local data to produce even more data themselves. In order to perform an analysis, a bioinformatician uses one or more of these resources to gather, filter, and transform data to answer a question. Thus, bioinformatics is an in silico science.
Drug Discovery Today: Biosilico | 2004
Robert Stevens; Robin McEntire; Carole A. Goble; Mark Greenwood; Jun Zhao; Anil Wipat; Peter Li
In its early development, Grid computing has focused on providing the computational power necessary for solving computationally intensive scientific problems. However, the scientific process in the life sciences is less demanding on computational power but contains a high degree of inherent heterogeneity, and semantic and task complexity. The myGrid project has developed a Grid-enabled middleware framework to manage this complexity associated with the scientific process within the bioinformatics domain. The drug discovery process is an example of a complex scientific problem that involves managing vast amounts of information. The technology developed by the myGrid project is applicable for managing many aspects of drug discovery and development by leveraging its technology for data storage, workflow enactment, change event notification, resource discovery and provenance management.
international conference on management of data | 2007
Paolo Missier; Suzanne M. Embury; Mark Greenwood; Alun David Preece; Binling Jin
Data-intensive e-science applications often rely on third-party data found in public repositories, whose quality is largely unknown. Although scientists are aware that this uncertainty may lead to incorrect scientific conclusions, in the absence of a quantitative characterization of data quality properties they find it difficult to formulate precise data acceptability criteria. We present an Information Quality management workbench, called Qurator, that supports data experts in the specification of personal quality models, and lets them derive effective criteria for data acceptability. The demo of our working prototype will illustrate our approach on a real e-science workflow for a bioinformatics application.
Journal of Biomedical Semantics | 2013
Irena Spasic; Mark Greenwood; Alun David Preece; Nicholas Andrew Francis; Glyn Elwyn
BackgroundThe increasing amount of textual information in biomedicine requires effective term recognition methods to identify textual representations of domain-specific concepts as the first step toward automating its semantic interpretation. The dictionary look-up approaches may not always be suitable for dynamic domains such as biomedicine or the newly emerging types of media such as patient blogs, the main obstacles being the use of non-standardised terminology and high degree of term variation.ResultsIn this paper, we describe FlexiTerm, a method for automatic term recognition from a domain-specific corpus, and evaluate its performance against five manually annotated corpora. FlexiTerm performs term recognition in two steps: linguistic filtering is used to select term candidates followed by calculation of termhood, a frequency-based measure used as evidence to qualify a candidate as a term. In order to improve the quality of termhood calculation, which may be affected by the term variation phenomena, FlexiTerm uses a range of methods to neutralise the main sources of variation in biomedical terms. It manages syntactic variation by processing candidates using a bag-of-words approach. Orthographic and morphological variations are dealt with using stemming in combination with lexical and phonetic similarity measures. The method was evaluated on five biomedical corpora. The highest values for precision (94.56%), recall (71.31%) and F-measure (81.31%) were achieved on a corpus of clinical notes.ConclusionsFlexiTerm is an open-source software tool for automatic term recognition. It incorporates a simple term variant normalisation method. The method proved to be more robust than the baseline against less formally structured texts, such as those found in patient blogs or medical notes. The software can be downloaded freely at http://www.cs.cf.ac.uk/flexiterm.
In: 4th UK e-Science All Hands Meeting (AHM 2005): 4th UK e-Science All Hands Meeting (AHM 2005); 19 Sep 2005-22 Sep 2005; Nottingham, ENGLAND. John Wiley & Sons Ltd; 2005. p. 253-264. | 2005
Alun David Preece; Paolo Missier; S Ernbury; Binling Jin; Mark Greenwood
In this paper we outline a framework for managing information quality (IQ) in an e‐Science context. In contrast to previous approaches that take a very abstract view of IQ properties, we allow scientists to define the quality characteristics that are of importance to them in their particular domain. For example, ‘accuracy’ may be defined in terms of the conformance of experimental data to a particular standard. User‐scientists specify their IQ preferences against a formal ontology, so that the definitions are machine‐manipulable, allowing the environment to classify and organize domain‐specific quality characteristics within an overall quality management framework. As an illustration of our approach, we present an example Web service that computes IQ annotations for experiment datasets in transcriptomics. Copyright
Biomedical Informatics Insights | 2012
Irena Spasic; Peter Burnap; Mark Greenwood; Michael Arribas-Ayllon
The authors present a system developed for the 2011 i2b2 Challenge on Sentiment Classification, whose aim was to automatically classify sentences in suicide notes using a scheme of 15 topics, mostly emotions. The system combines machine learning with a rule-based methodology. The features used to represent a problem were based on lexico–semantic properties of individual words in addition to regular expressions used to represent patterns of word usage across different topics. A naïve Bayes classifier was trained using the features extracted from the training data consisting of 600 manually annotated suicide notes. Classification was then performed using the naïve Bayes classifier as well as a set of pattern–matching rules. The classification performance was evaluated against a manually prepared gold standard consisting of 300 suicide notes, in which 1,091 out of a total of 2,037 sentences were associated with a total of 1,272 annotations. The competing systems were ranked using the micro-averaged F-measure as the primary evaluation metric. Our system achieved the F-measure of 53% (with 55% precision and 52% recall), which was significantly better than the average performance of 48.75% achieved by the 26 participating teams.
international conference on conceptual modeling | 2005
Paolo Missier; Alun David Preece; Suzanne M. Embury; Binling Jin; Mark Greenwood; David Stead; Al Brown
We describe a new approach to managing information quality (IQ) in an e-Science context, by allowing scientists to define the quality characteristics that are of importance in their particular domain. These preferences are specified and classified in relation to a formal IQ ontology, intended to support the discovery and reuse of scientists quality descriptors and metrics. In this paper, we present a motivating scenario from the biological sub-domain of proteomics, and use it to illustrate how the generic quality model we have developed can be expanded incrementally without making unreasonable demands on the domain expert who maintains it.
international conference on cloud and green computing | 2013
Mark Greenwood; Glyn Elwyn; Nicholas Andrew Francis; Alun David Preece; Irena Spasic
People with long-term illness such as chronic obstructive pulmonary disease (COPD) often use social media to document and share information, opinions and their experiences with others. Analysing the self-reported experiences of patients shared online has the potential to help medical researchers gain insight into some of the key issues affecting patients. However, the scale of health conversation taking place online poses considerable challenges to traditional content analysis. In this paper, we present a system which automates extraction of patient statements which refer to a personal experience. We applied a crowd sourcing methodology to create a set of 1770 annotated sentences from blog posts written by COPD patients. Our machine learning approach trained on lexical features successfully extracted sentences about patient experience with 93% precision and 80% recall (F-measure: 86%). Automatic annotation of sentences about patient experience can facilitate subsequent content analysis by highlighting the most relevant sentences to this particular problem.
In: Proceedings of the UK e-Science Programme All Hands Conference; {EPSRC}; 2003. p. 223-226. | 2003
Mark Greenwood; Carole Goble; Robert D. Stevens; Jun Zhao; Matthew Addis; Darren Marvin; Luc Moreau; Tom Oinn; Paul Watson
Concurrency and Computation: Practice and Experience | 2006
Tom Oinn; Mark Greenwood; Matthew Addis; M. Nedim Alpdemir; Justin Ferris; Kevin Glover; Carole A. Goble; Antoon Goderis; Duncan Hull; Darren Marvin; Peter Li; Phillip Lord; Matthew Pocock; Martin Senger; Robert Stevens; Anil Wipat; Chris Wroe