Angus Roberts
University of Sheffield
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Angus Roberts.
International Journal of Cooperative Information Systems | 2003
Chris Wroe; Robert Stevens; Carole A. Goble; Angus Roberts; R. Mark Greenwood
The growing quantity and distribution of bioinformatics resources means that finding and utilizing them requires a great deal of expert knowledge, especially as many resources need to be tied together into a workflow to accomplish a useful goal. We want to formally capture at least some of this knowledge within a virtual workbench and middleware framework to assist a wider range of biologists in utilizing these resources. Different activities require different representations of knowledge. Finding or substituting a service within a workflow is often best supported by a classification. Marshalling and configuring services is best accomplished using a formal description. Both representations are highly interdependent and maintaining consistency between the two by hand is difficult. We report on a description logic approach using the web ontology language DAML+OIL that uses property based service descriptions. The ontology is founded on DAML-S to dynamically create service classifications. These classifications are then used to support semantic service matching and discovery in a large grid based middleware project . We describe the extensions necessary to DAML-S in order to support bioinformatics service description; the utility of DAML+OIL in creating dynamic classifications based on formal descriptions; and the implementation of a DAML+OIL ontology service to support partial user-driven service matching and composition.
Journal of Biomedical Informatics | 2009
Angus Roberts; Robert J. Gaizauskas; Mark Hepple; George Demetriou; Yikun Guo; Ian Roberts; Andrea Setzer
In this paper, we describe the construction of a semantically annotated corpus of clinical texts for use in the development and evaluation of systems for automatically extracting clinically significant information from the textual component of patient records. The paper details the sampling of textual material from a collection of 20,000 cancer patient records, the development of a semantic annotation scheme, the annotation methodology, the distribution of annotations in the final corpus, and the use of the corpus for development of an adaptive information extraction system. The resulting corpus is the most richly semantically annotated resource for clinical text processing built to date, whose value has been demonstrated through its use in developing an effective information extraction system. The detailed presentation of our corpus construction and annotation methodology will be of value to others seeking to build high-quality semantically annotated corpora in biomedical domains.
BMC Medical Informatics and Decision Making | 2013
Andrea Fernandes; Danielle Cloete; Matthew Broadbent; Richard D. Hayes; Chin-Kuo Chang; Richard Jackson; Angus Roberts; Jason Tsang; Murat Soncul; Jennifer Liebscher; Robert Stewart; Felicity Callard
BackgroundElectronic health records (EHRs) provide enormous potential for health research but also present data governance challenges. Ensuring de-identification is a pre-requisite for use of EHR data without prior consent. The South London and Maudsley NHS Trust (SLaM), one of the largest secondary mental healthcare providers in Europe, has developed, from its EHRs, a de-identified psychiatric case register, the Clinical Record Interactive Search (CRIS), for secondary research.MethodsWe describe development, implementation and evaluation of a bespoke de-identification algorithm used to create the register. It is designed to create dictionaries using patient identifiers (PIs) entered into dedicated source fields and then identify, match and mask them (with ZZZZZ) when they appear in medical texts. We deemed this approach would be effective, given high coverage of PI in the dedicated fields and the effectiveness of the masking combined with elements of a security model. We conducted two separate performance tests i) to test performance of the algorithm in masking individual true PIs entered in dedicated fields and then found in text (using 500 patient notes) and ii) to compare the performance of the CRIS pattern matching algorithm with a machine learning algorithm, called the MITRE Identification Scrubber Toolkit – MIST (using 70 patient notes – 50 notes to train, 20 notes to test on). We also report any incidences of potential breaches, defined by occurrences of 3 or more true or apparent PIs in the same patient’s notes (and in an additional set of longitudinal notes for 50 patients); and we consider the possibility of inferring information despite de-identification.ResultsTrue PIs were masked with 98.8% precision and 97.6% recall. As anticipated, potential PIs did appear, owing to misspellings entered within the EHRs. We found one potential breach. In a separate performance test, with a different set of notes, CRIS yielded 100% precision and 88.5% recall, while MIST yielded a 95.1% and 78.1%, respectively. We discuss how we overcome the realistic possibility – albeit of low probability – of potential breaches through implementation of the security model.ConclusionCRIS is a de-identified psychiatric database sourced from EHRs, which protects patient anonymity and maximises data available for research. CRIS demonstrates the advantage of combining an effective de-identification algorithm with a carefully designed security model. The paper advances much needed discussion of EHR de-identification – particularly in relation to criteria to assess de-identification, and considering the contexts of de-identified research databases when assessing the risk of breaches of confidential patient information.
Journal of Biomedical Informatics | 2012
Harsha Gurulingappa; Abdul Mateen Rajput; Angus Roberts; Juliane Fluck; Martin Hofmann-Apitius; Luca Toldo
A significant amount of information about drug-related safety issues such as adverse effects are published in medical case reports that can only be explored by human readers due to their unstructured nature. The work presented here aims at generating a systematically annotated corpus that can support the development and validation of methods for the automatic extraction of drug-related adverse effects from medical case reports. The documents are systematically double annotated in various rounds to ensure consistent annotations. The annotated documents are finally harmonized to generate representative consensus annotations. In order to demonstrate an example use case scenario, the corpus was employed to train and validate models for the classification of informative against the non-informative sentences. A Maximum Entropy classifier trained with simple features and evaluated by 10-fold cross-validation resulted in the F₁ score of 0.70 indicating a potential useful application of the corpus.
cluster computing and the grid | 2003
Luc Moreau; Simon Miles; Carole A. Goble; R. Mark Greenwood; Vijay Dialani; Matthew Addis; M. Nedim Alpdemir; Rich Cawley; David De Roure; Justin Ferris; Robert J. Gaizauskas; Kevin Glover; Chris Greenhalgh; Peter Li; Xiaojian Liu; Phillip Lord; Michael Luck; Darren Marvin; Tom Oinn; Norman W. Paton; Steve Pettifer; Milena Radenkovic; Angus Roberts; Alan Robinson; Tom Rodden; Martin Senger; Nick Sharman; Robert Stevens; Brian Warboys; Anil Wipat
My Grid is an e-Science Grid project that aims to help biologists and bioinformaticians to perform workflow-based in silico experiments, and help them to automate the management of such workflows through personalisation, notification of change and publication of experiments. In this paper, we describe the architecture of my Grid and how it will be used by the scientist. We then show how my Grid can benefit from agents technologies. We have identified three key uses of agent technologies in my Grid: user agents, able to customize and personalise data, agent communication languages offering a generic and portable communication medium, and negotiation allowing multiple distributed entities to reach service level agreements.
BMC Bioinformatics | 2008
Angus Roberts; Robert J. Gaizauskas; Mark Hepple; Yikun Guo
BackgroundThe Clinical E-Science Framework (CLEF) project has built a system to extract clinically significant information from the textual component of medical records in order to support clinical research, evidence-based healthcare and genotype-meets-phenotype informatics. One part of this system is the identification of relationships between clinically important entities in the text. Typical approaches to relationship extraction in this domain have used full parses, domain-specific grammars, and large knowledge bases encoding domain knowledge. In other areas of biomedical NLP, statistical machine learning (ML) approaches are now routinely applied to relationship extraction. We report on the novel application of these statistical techniques to the extraction of clinical relationships.ResultsWe have designed and implemented an ML-based system for relation extraction, using support vector machines, and trained and tested it on a corpus of oncology narratives hand-annotated with clinically important relationships. Over a class of seven relation types, the system achieves an average F1 score of 72%, only slightly behind an indicative measure of human inter annotator agreement on the same task. We investigate the effectiveness of different features for this task, how extraction performance varies between inter- and intra-sentential relationships, and examine the amount of training data needed to learn various relationships.ConclusionWe have shown that it is possible to extract important clinical relationships from text, using supervised statistical ML techniques, at levels of accuracy approaching those of human annotators. Given the importance of relation extraction as an enabling technology for text mining and given also the ready adaptability of systems based on our supervised learning approach to other clinical relationship extraction tasks, this result has significance for clinical text mining more generally, though further work to confirm our encouraging results should be carried out on a larger sample of narratives and relationship types.
Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing | 2008
Angus Roberts; Robert J. Gaizauskas; Mark Hepple
The Clinical E-Science Framework (CLEF) project has built a system to extract clinically significant information from the textual component of medical records, for clinical research, evidence-based healthcare and genotype-meets-phenotype informatics. One part of this system is the identification of relationships between clinically important entities in the text. Typical approaches to relationship extraction in this domain have used full parses, domain-specific grammars, and large knowledge bases encoding domain knowledge. In other areas of biomedical NLP, statistical machine learning approaches are now routinely applied to relationship extraction. We report on the novel application of these statistical techniques to clinical relationships. We describe a supervised machine learning system, trained with a corpus of oncology narratives hand-annotated with clinically important relationships. Various shallow features are extracted from these texts, and used to train statistical classifiers. We compare the suitability of these features for clinical relationship extraction, how extraction varies between inter- and intra-sentential relationships, and examine the amount of training data needed to learn various relationships.
language resources and evaluation | 2013
Kalina Bontcheva; Hamish Cunningham; Ian Roberts; Angus Roberts; Valentin Tablan; Niraj Aswani; Genevieve Gorrell
This paper presents GATE Teamware—an open-source, web-based, collaborative text annotation framework. It enables users to carry out complex corpus annotation projects, involving distributed annotator teams. Different user roles are provided (annotator, manager, administrator) with customisable user interface functionalities, in order to support the complex workflows and user interactions that occur in corpus annotation projects. Documents may be pre-processed automatically, so that human annotators can begin with text that has already been pre-annotated and thus making them more efficient. The user interface is simple to learn, aimed at non-experts, and runs in an ordinary web browser, without need of additional software installation. GATE Teamware has been evaluated through the creation of several gold standard corpora and internal projects, as well as through external evaluation in commercial and EU text annotation projects. It is available as on-demand service on GateCloud.net, as well as open-source for self-installation.
BMJ Open | 2015
Rashmi Patel; Nishamali Jayatilleke; Matthew Broadbent; Chin-Kuo Chang; Nadia Foskett; Genevieve Gorrell; Richard D. Hayes; Richard Jackson; Caroline Johnston; Hitesh Shetty; Angus Roberts; Philip McGuire; Robert Stewart
Objectives To identify negative symptoms in the clinical records of a large sample of patients with schizophrenia using natural language processing and assess their relationship with clinical outcomes. Design Observational study using an anonymised electronic health record case register. Setting South London and Maudsley NHS Trust (SLaM), a large provider of inpatient and community mental healthcare in the UK. Participants 7678 patients with schizophrenia receiving care during 2011. Main outcome measures Hospital admission, readmission and duration of admission. Results 10 different negative symptoms were ascertained with precision statistics above 0.80. 41% of patients had 2 or more negative symptoms. Negative symptoms were associated with younger age, male gender and single marital status, and with increased likelihood of hospital admission (OR 1.24, 95% CI 1.10 to 1.39), longer duration of admission (β-coefficient 20.5 days, 7.6–33.5), and increased likelihood of readmission following discharge (OR 1.58, 1.28 to 1.95). Conclusions Negative symptoms were common and associated with adverse clinical outcomes, consistent with evidence that these symptoms account for much of the disability associated with schizophrenia. Natural language processing provides a means of conducting research in large representative samples of patients, using data recorded during routine clinical practice.
BMC Psychiatry | 2015
Giouliana Kadra; Robert Stewart; Hitesh Shetty; Richard Jackson; Mark A. Greenwood; Angus Roberts; Chin-Kuo Chang; James H. MacCabe; Richard D. Hayes
BackgroundAntipsychotic prescription information is commonly derived from structured fields in clinical health records. However, utilising diverse and comprehensive sources of information is especially important when investigating less frequent patterns of medication prescribing such as antipsychotic polypharmacy (APP). This study describes and evaluates a novel method of extracting APP data from both structured and free-text fields in electronic health records (EHRs), and its use for research purposes.MethodsUsing anonymised EHRs, we identified a cohort of patients with serious mental illness (SMI) who were treated in South London and Maudsley NHS Foundation Trust mental health care services between 1 January and 30 June 2012. Information about antipsychotic co-prescribing was extracted using a combination of natural language processing and a bespoke algorithm. The validity of the data derived through this process was assessed against a manually coded gold standard to establish precision and recall. Lastly, we estimated the prevalence and patterns of antipsychotic polypharmacy.ResultsIndividual instances of antipsychotic prescribing were detected with high precision (0.94 to 0.97) and moderate recall (0.57-0.77). We detected baseline APP (two or more antipsychotics prescribed in any 6-week window) with 0.92 precision and 0.74 recall and long-term APP (antipsychotic co-prescribing for 6 months) with 0.94 precision and 0.60 recall. Of the 7,201 SMI patients receiving active care during the observation period, 338 (4.7 %; 95 % CI 4.2-5.2) were identified as receiving long-term APP. Two second generation antipsychotics (64.8 %); and first -second generation antipsychotics were most commonly co-prescribed (32.5 %).ConclusionsThese results suggest that this is a potentially practical tool for identifying polypharmacy from mental health EHRs on a large scale. Furthermore, extracted data can be used to allow researchers to characterize patterns of polypharmacy over time including different drug combinations, trends in polypharmacy prescribing, predictors of polypharmacy prescribing and the impact of polypharmacy on patient outcomes.