David Milward
St John's Innovation Centre
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by David Milward.
pacific symposium on biocomputing | 1999
James Thomas; David Milward; Christos A. Ouzounis; Stephen Pulman; Mark Carroll
This paper motivates the use of Information Extraction (IE) for gathering data on protein interactions, describes the customization of an existing IE system, SRIs Highlight, for this task and presents the results of an experiment on unseen Medline abstracts which show that customization to a new domain can be fast, reliable and cost-effective.
Journal of Biomedical Semantics | 2011
Dietrich Rebholz-Schuhmann; Antonio Jimeno Yepes; Chen Li; Senay Kafkas; Ian Lewin; Ning Kang; Peter Corbett; David Milward; Ekaterina Buyko; Elena Beisswanger; Kerstin Hornbostel; Alexandre Kouznetsov; René Witte; Jonas B. Laurila; Christopher J. O. Baker; Cheng-Ju Kuo; Simone Clematide; Fabio Rinaldi; Richárd Farkas; György Móra; Kazuo Hara; Laura I. Furlong; Michael Rautschka; Mariana Neves; Alberto Pascual-Montano; Qi Wei; Nigel Collier; Faisal Mahbub Chowdhury; Alberto Lavelli; Rafael Berlanga
BackgroundCompetitions in text mining have been used to measure the performance of automatic text processing solutions against a manually annotated gold standard corpus (GSC). The preparation of the GSC is time-consuming and costly and the final corpus consists at the most of a few thousand documents annotated with a limited set of semantic groups. To overcome these shortcomings, the CALBC project partners (PPs) have produced a large-scale annotated biomedical corpus with four different semantic groups through the harmonisation of annotations from automatic text mining solutions, the first version of the Silver Standard Corpus (SSC-I). The four semantic groups are chemical entities and drugs (CHED), genes and proteins (PRGE), diseases and disorders (DISO) and species (SPE). This corpus has been used for the First CALBC Challenge asking the participants to annotate the corpus with their text processing solutions.ResultsAll four PPs from the CALBC project and in addition, 12 challenge participants (CPs) contributed annotated data sets for an evaluation against the SSC-I. CPs could ignore the training data and deliver the annotations from their genuine annotation system, or could train a machine-learning approach on the provided pre-annotated data. In general, the performances of the annotation solutions were lower for entities from the categories CHED and PRGE in comparison to the identification of entities categorized as DISO and SPE. The best performance over all semantic groups were achieved from two annotation solutions that have been trained on the SSC-I.The data sets from participants were used to generate the harmonised Silver Standard Corpus II (SSC-II), if the participant did not make use of the annotated data set from the SSC-I for training purposes. The performances of the participants’ solutions were again measured against the SSC-II. The performances of the annotation solutions showed again better results for DISO and SPE in comparison to CHED and PRGE.ConclusionsThe SSC-I delivers a large set of annotations (1,121,705) for a large number of documents (100,000 Medline abstracts). The annotations cover four different semantic groups and are sufficiently homogeneous to be reproduced with a trained classifier leading to an average F-measure of 85%. Benchmarking the annotation solutions against the SSC-II leads to better performance for the CPs’ annotation solutions in comparison to the SSC-I.
language resources and evaluation | 2010
Adam Z. Wyner; Raquel Mochales-Palau; Marie-Francine Moens; David Milward
This paper describes recent approaches using text-mining to automatically profile and extract arguments from legal cases. We outline some of the background context and motivations. We then turn to consider issues related to the construction and composition of corpora of legal cases. We show how a Context-Free Grammar can be used to extract arguments, and how ontologies and Natural Language Processing can identify complex information such as case factors and participant roles. Together the results bring us closer to automatic identification of legal arguments.
Comparative and Functional Genomics | 2005
David Milward; Marcus Bjäreland; William S. Hayes; Michelle Joanna Maxwell; Lisa Öberg; Nick Tilford; James Thomas; Roger Hale; Sylvia Knight; Julie Christine Barnes
Over recent years, there has been a growing interest in extracting information automatically or semi-automatically from the scientific literature. This paper describes a novel ontology-based interactive information extraction (OBIIE) framework and a specific OBIIE system. We describe how this system enables life scientists to make ad hoc queries similar to using a standard search engine, but where the results are obtained in a database format similar to a pre-programmed information extraction engine. We present a case study in which the system was evaluated for extracting co-factors from EMBASE and MEDLINE.
meeting of the association for computational linguistics | 2000
David Milward
A syntax tree or standard semantic representation can be represented as a set of indexed constraints. This paper describes how this idea can be used in task oriented dialogue systems to provide interpretation rules which incorporate structural and contextual constraints where available, and degrade gracefully on ungrammatical input.
cross language evaluation forum | 2013
Dietrich Rebholz-Schuhmann; Simon Clematide; Fabio Rinaldi; Senay Kafkas; Erik M. van Mulligen; Chinh Bui; Johannes Hellrich; Ian Lewin; David Milward; Michael Poprat; Antonio Jimeno-Yepes; Udo Hahn; Jan A. Kors
The identification and normalisation of biomedical entities from the scientific literature has a long tradition and a number of challenges have contributed to the development of reliable solutions. Increasingly patient records are processed to align their content with other biomedical data resources, but this approach requires analysing documents in different languages across Europe [1,2]. The CLEF-ER challenge has been organized by the Mantra project partners to improve entity recognition ER in multilingual documents. Several corpora in different languages, i.e. Medline titles, EMEA documents and patent claims, have been prepared to enable ER in parallel documents. The participants have been ask to annotate entity mentions with concept unique identifiers CUIs in the documents of their preferred non-English language. The evaluation determines the number of correctly identified entity mentions against a silver standard Task A and the performance measures for the identification of CUIs in the non-English corpora. The participants could make use of the prepared terminological resources for entity normalisation and of the English silver standard corpora SSCs as input for concept candidates in the non-English documents. The participants used different approaches including translation techniques and word or phrase alignments apart from lexical lookup and other text mining techniques. The performances for task A and B was lower for the patent corpus in comparison to Medline titles and EMEA documents. In the patent documents, chemical entities were identified at higher performance, whereas the other two document types cover a higher portion of medical terms. The number of novel terms provided from all corpora is currently under investigation. Altogether, the CLEF-ER challenge demonstrates the performances of annotation solutions in different languages against an SSC.
recent advances in natural language processing | 2000
David Milward; James Thomas
This paper describes a system which enables users to create on-the-fly queries which involve not just keywords, but also sortal constraints and linguistic constraints. The user can specify how the results should be presented e.g. in terms of links to documents, or as table entries. The aim is to bridge the gap between keyword based Information Retrieval and pattern based Information Extraction.
text speech and dialogue | 2003
Martin Beveridge; David Milward
This paper investigates the use of abstract task specifications for dialogue management in the medical domain. In most current dialogue systems, possible interactions with the system are hand-coded in the design. This is an expensive process, especially for complex dialogues. This paper motivates the use of a task description language for building flexible and adaptive dialogue systems in ontologically rich domains such as medicine. It describes the components of a task specification, and proposes an architecture for dialogue systems which al-lows integration of domain reasoning and dialogue. A high-level dialogue specification is used to support multimodal input and output, including generation of HTML pages, and generation of fragments of VoiceXML for spoken in-teraction.
international conference on computational linguistics | 1996
Karsten Konrad; Holger Maier; David Milward; Manfred Pinkal
The CLEARS (Computational Linguistics Education and Research for Semantics) tool provides a graphical interface allowing interactive construction of semantic representations in a variety of different formalisms, and using several construction methods. CLEARS was developed as part of the FraCaS project which was designed to encourage convergence between different semantic formalisms, such as Montague-Grammar, DRT, and Situation Semantics. The CLEARS system is freely available on the WWW from this http URL
Journal of Bioinformatics and Computational Biology | 2010
Dietrich Rebholz-Schuhmann; Antonio Jimeno Yepes; Erik M. van Mulligen; Ning Kang; Jan A. Kors; David Milward; Peter Corbett; Ekaterina Buyko; Elena Beisswanger; Udo Hahn