Claire Grover | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Claire Grover is active.

Explore More

Publication

Featured researches published by Claire Grover.

conference of the european chapter of the association for computational linguistics | 1999

Named Entity recognition without gazetteers

Andrei Mikheev; Marc Moens; Claire Grover

It is often claimed that Named Entity recognition systems need extensive gazetteers---lists of names of people, organisations, locations, and other named entities. Indeed, the compilation of such gazetteers is sometimes mentioned as a bottleneck in the design of Named Entity recognition systems.We report on a Named Entity recognition system which combines rule-based grammars with statistical (maximum entropy) models. We report on the systems performance with gazetteers of different types and different sizes, using test material from the MUC-7 competition. We show that, for the text type and task of this competition, it is sufficient to use relatively small gazetteers of well-known names, rather than large gazetteers of low-frequency names. We conclude with observations about the domain independence of the competition and of our experiments.

pacific symposium on biocomputing | 2007

Assisted Curation: Does Text Mining Really Help?

Beatrice Alex; Claire Grover; Barry Haddow; Mijail Kabadjov; Ewan Klein; Michael Matthews; Stuart Roebuck; Richard Tobin; Xinglong Wang

Although text mining shows considerable promise as a tool for supporting the curation of biomedical text, there is little concrete evidence as to its effectiveness. We report on three experiments measuring the extent to which curation can be speeded up with assistance from Natural Language Processing (NLP), together with subjective feedback from curators on the usability of a curation tool that integrates NLP hypotheses for protein-protein interactions (PPIs). In our curation scenario, we found that a maximum speed-up of 1/3 in curation time can be expected if NLP output is perfectly accurate. The preference of one curator for consistent NLP output and output with high recall needs to be confirmed in a larger study with several curators.

meeting of the association for computational linguistics | 1987

The Derivation of a Grammatically Indexed Lexicon from the Longman Dictionary of Contemporary English

Branimir Boguraev; Ted Briscoe; John A. Carroll; David M. Carter; Claire Grover

We describe a methodology and associated software system for the construction of a large lexicon from an existing machine-readable (published) dictionary. The lexicon serves as a component of an English morphological and syntactic analyser and contains entries with grammatical definitions compatible with the word and sentence grammar employed by the analyser. We describe a software system with two integrated components. One of these is capable of extracting syntactically rich, theory-neutral lexical templates from a suitable machine-readable source. The second supports interactive and semi-automatic generation and testing of target lexical entries in order to derive a sizeable, accurate and consistent lexicon from the source dictionary which contains partial (and occasionally in-accurate) information. Finally, we evaluate the utility of the Longman Dictionary of Contemporary English as a suitable source dictionary for the target lexicon.

conference of the european chapter of the association for computational linguistics | 1995

Algorithms for analysing the temporal structure of discourse

Janet Hitzeman; Marc Moens; Claire Grover

We describe a method for analysing the temporal structure of a discourse which takes into account the effects of tense, aspect, temporal adverbials and rhetorical structure and which minimises unnecessary ambiguity in the temporal structure. It is part of a discourse grammar implemented in Carpenters ALE formalism. The method for building up the temporal structure of the discourse combines constraints and prefernces: we use constraints to reduce the number of possible structures, exploiting the HPSG type hierarchy and unification for this purpose; and we apply preferences to choose between the remaining options using a temporal centering mechanism. We end by recommending that an underspecified representation of the structure using these techniques be used to avoid generating the temporal/rhetorical structure until higher-level information can be used to disambiguate.

Artificial Intelligence and Law | 2006

Extractive summarisation of legal texts

Ben Hachey; Claire Grover

We describe research carried out as part of a text summarisation project for the legal domain for which we use a new XML corpus of judgments of the UK House of Lords. These judgments represent a particularly important part of public discourse due to the role that precedents play in English law. We present experimental results using a range of features and machine learning techniques for the task of predicting the rhetorical status of sentences and for the task of selecting the most summary-worthy sentences from a document. Results for these components are encouraging as they achieve state-of-the-art accuracy using robust, automatically generated cue phrase information. Sample output from the system illustrates the potential of summarisation technology for legal information management systems and highlights the utility of our rhetorical annotation scheme as a model of legal discourse, which provides a clear means for structuring summaries and tailoring them to different types of users.

Comparative and Functional Genomics | 2005

A system for identifying named entities in biomedical text: how results from two evaluations reflect on both the system and the evaluations.

Shipra Dingare; Malvina Nissim; Jenny Rose Finkel; Christopher D. Manning; Claire Grover

We present a maximum entropy-based system for identifying named entities (NEs) in biomedical abstracts and present its performance in the only two biomedical named entity recognition (NER) comparative evaluations that have been held to date, namely BioCreative and Coling BioNLP. Our system obtained an exact match F-score of 83.2% in the BioCreative evaluation and 70.1% in the BioNLP evaluation. We discuss our system in detail, including its rich use of local features, attention to correct boundary identification, innovative use of external knowledge resources, including parsing and web searches, and rapid adaptation to new NE sets. We also discuss in depth problems with data annotation in the evaluations which caused the final performance to be lower than optimal.

meeting of the association for computational linguistics | 2007

Recognising Nested Named Entities in Biomedical Text

Beatrice Alex; Barry Haddow; Claire Grover

Although recent named entity (NE) annotation efforts involve the markup of nested entities, there has been limited focus on recognising such nested structures. This paper introduces and compares three techniques for modelling and recognising nested entities by means of a conventional sequence tagger. The methods are tested and evaluated on two biomedical data sets that contain entity nesting. All methods yield an improvement over the baseline tagger that is only trained on flat annotation.

Natural Language Engineering | 2005

A comparison of parsing technologies for the biomedical domain

Claire Grover; Alex Lascarides; Mirella Lapata

This paper reports on a number of experiments which are designed to investigate the extent to which current NLP resources are able to syntactically and semantically analyse biomedical text. We address two tasks: (a) parsing a real corpus with a hand-built wide-coverage grammar, producing both syntactic analyses and logical forms and (b) automatically computing the interpretation of compound nouns where the head is a nominalisation (e.g. hospital arrival means an arrival at hospital, while patient arrival means an arrival of a patient). For the former task we demonstrate that flexible and yet constrained pre-processing techniques are crucial to success: these enable us to use part-of-speech tags to overcome inadequate lexical coverage, and to package up complex technical expressions prior to parsing so that they are blocked from creating misleading amounts of syntactic complexity. We argue that the XML-processing paradigm is ideally suited for automatically preparing the corpus for parsing. For the latter task, we compute interpretations of the compounds by exploiting surface cues and meaning paraphrases, which in turn are extracted from the parsed corpus. This provides an empirical setting in which we can compare the utility of a comparatively deep parser vs. a shallow one, exploring the trade-off between resolving attachment ambiguities on the one hand and generating errors in the parses on the other. We demonstrate that a model of the meaning of compound nominalisations is achievable with the aid of current broad-coverage parsers.

Philosophical Transactions of the Royal Society A | 2010

Use of the Edinburgh geoparser for georeferencing digitized historical collections

Claire Grover; Richard Tobin; Kate Byrne; Matthew Woollard; James Reid; Stuart Dunn; Julian Ball

We report on two JISC-funded projects that aimed to enrich the metadata of digitized historical collections with georeferences and other information automatically computed using geoparsing and related information extraction technologies. Understanding location is a critical part of any historical research, and the nature of the collections makes them an interesting case study for testing automated methodologies for extracting content. The two projects (GeoDigRef and Embedding GeoCrossWalk) have looked at how automatic georeferencing of resources might be useful in developing improved geographical search capacities across collections. In this paper, we describe the work that was undertaken to configure the geoparser for the collections as well as the evaluations that were performed.

geographic information retrieval | 2010

Evaluation of georeferencing

Richard Tobin; Claire Grover; Kate Byrne; James Reid; Jo Walsh

In this paper we describe a georeferencing system which first uses Information Extraction techniques to identify place names in textual documents and which then resolves the place names against a choice of gazetteers. We have used the system to georeference three digitised historical collections and have evaluated its performance against human annotated gold standard samples from the three collections. We have also evaluated its performance on the SpatialML corpus which is a geo-annotated corpus of newspaper text. The main focus of this paper is the evaluation of georesolution and we discuss evaluation methods and issues arising from the evaluation.

Explore More