Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Mark A. Greenwood is active.

Publication


Featured researches published by Mark A. Greenwood.


meeting of the association for computational linguistics | 2005

A Semantic Approach to IE Pattern Induction

Mark Stevenson; Mark A. Greenwood

This paper presents a novel algorithm for the acquisition of Information Extraction patterns. The approach makes the assumption that useful patterns will have similar meanings to those already identified as relevant. Patterns are compared using a variation of the standard vector space model in which information from an ontology is used to capture semantic similarity. Evaluation shows this algorithm performs well when compared with a previously reported document-centric approach.


Proceedings of the Workshop on Information Extraction Beyond The Document | 2006

Improving Semi-supervised Acquisition of Relation Extraction Patterns

Mark A. Greenwood; Mark Stevenson

This paper presents a novel approach to the semi-supervised learning of Information Extraction patterns. The method makes use of more complex patterns than previous approaches and determines their similarity using a measure inspired by recent work using kernel methods (Culotta and Sorensen, 2004). Experiments show that the proposed similarity measure outperforms a previously reported measure based on cosine similarity when used to perform binary relation extraction.


Proceedings of the Workshop on Information Extraction Beyond The Document | 2006

Comparing Information Extraction Pattern Models

Mark Stevenson; Mark A. Greenwood

Several recently reported techniques for the automatic acquisition of Information Extraction (IE) systems have used dependency trees as the basis of their extraction pattern representation. These approaches have used a variety of pattern models (schemes for representing IE patterns based on particular parts of the dependency analysis). An appropriate model should be expressive enough to represent the information which is to be extracted from text without being overly complicated. Four previously reported pattern models are evaluated using existing IE evaluation corpora and three dependency parsers. It was found that one model, linked chains, could represent around 95% of the information of interest without generating an unwieldy number of possible patterns.


international workshop/conference on parsing technologies | 2005

SUPPLE: A Practical Parser for Natural Language Engineering Applications

Robert J. Gaizauskas; Mark Hepple; Horacio Saggion; Mark A. Greenwood; Kevin Humphreys

We describe SUPPLE, a freely-available, open source natural language parsing system, implemented in Prolog, and designed for practical use in language engineering (LE) applications. SUPPLE can be run as a stand-alone application, or as a component within the GATE General Architecture for Text Engineering. SUPPLE is distributed with an example grammar that has been developed over a number of years across several LE projects. This paper describes the key characteristics of the parser and the distributed grammar.


patent information retrieval | 2011

Information Extraction and Semantic Annotation for Multi-Paradigm Information Management

Hamish Cunningham; Valentin Tablan; Ian Roberts; Mark A. Greenwood; Niraj Aswani

This chapter describes the development of GATE Mimir, a new tool for indexing documents according to multiple paradigms: full text, conceptual model, and annotation structures. We also present a usage example for patent searchers covering measurements and high-level structural information which was automatically extracted from a large patent corpus.


BMC Psychiatry | 2015

Extracting antipsychotic polypharmacy data from electronic health records: developing and evaluating a novel process

Giouliana Kadra; Robert Stewart; Hitesh Shetty; Richard Jackson; Mark A. Greenwood; Angus Roberts; Chin-Kuo Chang; James H. MacCabe; Richard D. Hayes

BackgroundAntipsychotic prescription information is commonly derived from structured fields in clinical health records. However, utilising diverse and comprehensive sources of information is especially important when investigating less frequent patterns of medication prescribing such as antipsychotic polypharmacy (APP). This study describes and evaluates a novel method of extracting APP data from both structured and free-text fields in electronic health records (EHRs), and its use for research purposes.MethodsUsing anonymised EHRs, we identified a cohort of patients with serious mental illness (SMI) who were treated in South London and Maudsley NHS Foundation Trust mental health care services between 1 January and 30 June 2012. Information about antipsychotic co-prescribing was extracted using a combination of natural language processing and a bespoke algorithm. The validity of the data derived through this process was assessed against a manually coded gold standard to establish precision and recall. Lastly, we estimated the prevalence and patterns of antipsychotic polypharmacy.ResultsIndividual instances of antipsychotic prescribing were detected with high precision (0.94 to 0.97) and moderate recall (0.57-0.77). We detected baseline APP (two or more antipsychotics prescribed in any 6-week window) with 0.92 precision and 0.74 recall and long-term APP (antipsychotic co-prescribing for 6 months) with 0.94 precision and 0.60 recall. Of the 7,201 SMI patients receiving active care during the observation period, 338 (4.7 %; 95 % CI 4.2-5.2) were identified as receiving long-term APP. Two second generation antipsychotics (64.8 %); and first -second generation antipsychotics were most commonly co-prescribed (32.5 %).ConclusionsThese results suggest that this is a potentially practical tool for identifying polypharmacy from mental health EHRs on a large scale. Furthermore, extracted data can be used to allow researchers to characterize patterns of polypharmacy over time including different drug combinations, trends in polypharmacy prescribing, predictors of polypharmacy prescribing and the impact of polypharmacy on patient outcomes.


PLOS ONE | 2012

Using Prior Information from the Medical Literature in GWAS of Oral Cancer Identifies Novel Susceptibility Variant on Chromosome 4 - the AdAPT Method

Mattias Johansson; Angus Roberts; Dan Chen; Yaoyong Li; Manon Delahaye-Sourdeix; Niraj Aswani; Mark A. Greenwood; Simone Benhamou; Pagona Lagiou; Ivana Holcatova; Lorenzo Richiardi; Kristina Kjaerheim; Antonio Agudo; Xavier Castellsagué; Tatiana V. Macfarlane; Luigi Barzan; Cristina Canova; Nalin Thakker; David I. Conway; Ariana Znaor; Claire M. Healy; Wolfgang Ahrens; David Zaridze; Neonilia Szeszenia-Dabrowska; Jolanta Lissowska; Eleonora Fabianova; Ioan Nicolae Mates; Vladimir Bencko; Lenka Foretova; Vladimir Janout

Background Genome-wide association studies (GWAS) require large sample sizes to obtain adequate statistical power, but it may be possible to increase the power by incorporating complementary data. In this study we investigated the feasibility of automatically retrieving information from the medical literature and leveraging this information in GWAS. Methods We developed a method that searches through PubMed abstracts for pre-assigned keywords and key concepts, and uses this information to assign prior probabilities of association for each single nucleotide polymorphism (SNP) with the phenotype of interest - the Adjusting Association Priors with Text (AdAPT) method. Association results from a GWAS can subsequently be ranked in the context of these priors using the Bayes False Discovery Probability (BFDP) framework. We initially tested AdAPT by comparing rankings of known susceptibility alleles in a previous lung cancer GWAS, and subsequently applied it in a two-phase GWAS of oral cancer. Results Known lung cancer susceptibility SNPs were consistently ranked higher by AdAPT BFDPs than by p-values. In the oral cancer GWAS, we sought to replicate the top five SNPs as ranked by AdAPT BFDPs, of which rs991316, located in the ADH gene region of 4q23, displayed a statistically significant association with oral cancer risk in the replication phase (per-rare-allele log additive p-value [ptrend] = 2.5×10−3). The combined OR for having one additional rare allele was 0.83 (95% CI: 0.76–0.90), and this association was independent of previously identified susceptibility SNPs that are associated with overall UADT cancer in this gene region. We also investigated if rs991316 was associated with other cancers of the upper aerodigestive tract (UADT), but no additional association signal was found. Conclusion This study highlights the potential utility of systematically incorporating prior knowledge from the medical literature in genome-wide analyses using the AdAPT methodology. AdAPT is available online (url: http://services.gate.ac.uk/lld/gwas/service/config).


international conference on computational linguistics | 2008

A Data Driven Approach to Query Expansion in Question Answering

Leon Derczynski; Jun Wang; Robert J. Gaizauskas; Mark A. Greenwood

Automated answering of natural language questions is an interesting and useful problem to solve. Question answering (QA) systems often perform information retrieval at an initial stage. Information retrieval (IR) performance, provided by engines such as Lucene, places a bound on overall system performance. For example, no answer bearing documents are retrieved at low ranks for almost 40% of questions. In this paper, answer texts from previous QA evaluations held as part of the Text REtrieval Conferences (TREC) are paired with queries and analysed in an attempt to identify performance-enhancing words. These words are then used to evaluate the performance of a query expansion method. Data driven extension words were found to help in over 70% of difficult questions. These words can be used to improve and evaluate query expansion methods. Simple blind relevance feedback (RF) was correctly predicted as unlikely to help overall performance, and an possible explanation is provided for its low value in IR for QA.


international acm sigir conference on research and development in information retrieval | 2004

Information retrieval for question answering a SIGIR 2004 workshop

Robert J. Gaizauskas; Mark Hepple; Mark A. Greenwood

Open domain question answering has become a very active research area over the past few years, due in large measure to the stimulus of the TREC Question Answering track. This track addresses the task of finding answers to natural language (NL) questions (e.g. How tall is the Eiffel Tower? Who is Aaron Copland?) from large text collections. This task stands in contrast to the more conventional IR task of retrieving documents relevant to a query, where the query may be simply a collection of keywords (e.g. Eiffel Tower, American composer, born Brooklyn NY 1900, ...).


Journal of Web Semantics | 2017

A Framework for Real-time Semantic Social Media Analysis

Diana Maynard; Ian Roberts; Mark A. Greenwood; Dominic Paul Rout; Kalina Bontcheva

This paper presents a framework for collecting and analysing large volume social media content. The real-time analytics framework comprises semantic annotation, Linked Open Data, semantic search, and dynamic result aggregation components. In addition, exploratory search and sense-making are supported through information visualisation interfaces, such as co-occurrence matrices, term clouds, treemaps, and choropleths. There is also an interactive semantic search interface (Prospector), where users can save, refine, and analyse the results of semantic search queries over time. Practical use of the framework is exemplified through three case studies: a general scenario analysing tweets from UK politicians and the public’s response to them in the run up to the 2015 UK general election, an investigation of attitudes towards climate change expressed by these politicians and the public, via their engagement with environmental topics, and an analysis of public tweets leading up to the UK’s referendum on leaving the EU (Brexit) in 2016. The paper also presents a brief evaluation and discussion of some of the key text analysis components, which are specifically adapted to the domain and task, and demonstrate scalability and efficiency of our toolkit in the case studies.

Collaboration


Dive into the Mark A. Greenwood's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ian Roberts

University of Sheffield

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Niraj Aswani

University of Sheffield

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

José Iria

University of Sheffield

View shared research outputs
Researchain Logo
Decentralizing Knowledge