Ann A. Copestake
University of Cambridge
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ann A. Copestake.
Knowledge Engineering Review | 1990
Ann A. Copestake; Karen Sparck Jones
This paper reviews the current state of the art in natural language access to databases. This has been a long-standing area of work in natural language processing. But though some commercial systems are now available, providing front ends has proved much harder than was expected, and the necessary limitations on front ends have to be recognized. The paper discusses the issues, both general to language and task-specific, involved in front end design, and the way these have been addressed, concentrating on the work of the last decade. The focus is on the central process of translating a natural language question into a database query, but other supporting functions are also covered. The points are illustrated by the use of a single example application. The paper concludes with an evaluation of the current state, indicating that future progress will depend on the one hand on general advances in natural language processing, and on the other on expanding the capabilities of traditional databases.
meeting of the association for computational linguistics | 2001
Ann A. Copestake; Alex Lascarides; Dan Flickinger
We develop a framework for formalizing semantic construction within grammars expressed in typed feature structure logics, including HPSG. The approach provides an alternative to the lambda calculus; it maintains much of the desirable flexibility of unification-based approaches to composition, while constraining the allowable operations in order to capture basic generalizations and improve maintainability.
meeting of the association for computational linguistics | 1991
Ann A. Copestake; Ted Briscoe
We consider lexical operations and their representation in a unification based lexicon and the role of lexical semantic information. We describe a unified treatment of the linguistic aspects of sense extension and derivational morphological processes which delimit the range of possible coercions between lexemes and give a preliminary account of how default interpretations may arise.
international conference on computational linguistics | 1990
Ted Briscoe; Ann A. Copestake; Bran Boguraev
Current research being undertaken at both Cambridge and IBM is aimed at the construction of substantial lexicons containing lexical semantic information capable of use in automated natural language processing (NLP) applications. This work extends previous research on the semi-automatic extraction of lexical information from machine-readable versions of conventional dictionaries (MRDs) (see e.g. the papers and references in Boguraev & Briscoe, 1989; Walker et al., 1988). The motivation for this and previous research using MRDs is that entirely manual development of lexicons for practical NLP applications is infeasible, given the labour-intensive nature of lexicography (e.g. Atkins, 1988) and the resources likely to be allocated to NLP in the foreseeable future. In this paper, we motivate a particular approach to lexical semantics, briefly demonstrate its computational tractability, and explore the possibility of extracting the lexical information this approach requires from MRDs and, to some extent, textual corpora.
Journal of Linguistics | 1998
Alex Lascarides; Ann A. Copestake
In this paper, we explore the interaction between lexical semantics and pragmatics. We argue that linguistic processing is informationally encapsulated and utilizes relatively simple ‘taxonomic’ lexical semantic knowledge. On this basis, defeasible lexical generalisations deliver defeasible parts of logical form. In contrast, pragmatic inference is open-ended and involves arbitrary real-world knowledge. Two axioms specify when pragmatic defaults override lexical ones. We demonstrate that modelling this interaction allows us to achieve a more refined interpretation of words in a discourse context than either the lexicon or pragmatics could do on their own.
conference on computational natural language learning | 2000
Guido Minnen; Francis Bond; Ann A. Copestake
Article choice can pose difficult problems in applications such as machine translation and automated summarization. In this paper, we investigate the use of corpus data to collect statistical generalizations about article use in English in order to be able to generate articles automatically to supplement a symbolic generator. We use data from the Penn Treebank as input to a memory-based learner (TiMBL 3.0; Daelemans et al., 2000) which predicts whether to generate an article with respect to an English base noun phrase. We discuss competitive results obtained using a variety of lexical, syntactic and semantic features that play an important role in automated article generation.
BMC Bioinformatics | 2008
Peter T. Corbett; Ann A. Copestake
BackgroundChemical named entities represent an important facet of biomedical text.ResultsWe have developed a system to use character-based n-grams, Maximum Entropy Markov Models and rescoring to recognise chemical names and other such entities, and to make confidence estimates for the extracted entities. An adjustable threshold allows the system to be tuned to high precision or high recall. At a threshold set for balanced precision and recall, we were able to extract named entities at an F score of 80.7% from chemistry papers and 83.2% from PubMed abstracts. Furthermore, we were able to achieve 57.6% and 60.3% recall at 95% precision, and 58.9% and 49.1% precision at 90% recall.ConclusionThese results show that chemical named entities can be extracted with good performance, and that the properties of the extraction can be tuned to suit the demands of the task.
Natural Language Engineering | 2000
Robert Malouf; John A. Carroll; Ann A. Copestake
One major obstacle to the efficient processing of large wide coverage grammars in unification-based grammatical frameworks such as HPSG is the time and space cost of the unification operation itself. In a grammar development system it is not appropriate to address this problem with techniques which involve lengthy compilation, since this slows down the edit-test-debug cycle. Nor is it possible to radically restructure the grammar. In this paper, we describe novel extensions to an existing efficient unification algorithm which improve its space and time behaviour (without affecting its correctness) by substantially increasing the amount of structure sharing that takes place. We also describe a fast and automatically tunable pre-unification filter (the ‘quick check’) which in practice detects a large proportion of unifications that if performed would fail. Finally, we present an efficient algorithm for checking for subsumption relationships between two feature structures; a special case of this gives a fast equality test. The subsumption check is used in a parser (described elsewhere in this issue) which ‘packs’ local ambiguities to avoid performing redundant sub-computations.
meeting of the association for computational linguistics | 2004
Advaith Siddharthan; Ann A. Copestake
We present an algorithm for generating referring expressions in open domains. Existing algorithms work at the semantic level and assume the availability of a classification for attributes, which is only feasible for restricted domains. Our alternative works at the realisation level, relies on Word-Net synonym and antonym sets, and gives equivalent results on the examples cited in the literature and improved results for examples that prior approaches cannot handle. We believe that ours is also the first algorithm that allows for the incremental incorporation of relations. We present a novel corpus-evaluation using referring expressions from the Penn Wall Street Journal Treebank.
conference on applied natural language processing | 1992
Ann A. Copestake
We describe the lexical knowledge base system (LKB) which has been designed and implemented as part of the ACQUILEX project1 to allow the representation of multilingual syntactic and semantic information extracted from machine readable dictionaries (MRDs), in such a way that it is usable by natural language processing (NLP) systems. The LKBs lexical representation language (LRL) augments typed graph-based unification with default inheritance, formalised in terms of default unification of feature structures. We evaluate how well the LRL meets the practical requirements arising from the semi-automatic construction of a large scale, multilingual lexicon. The system as described is fully implemented and is being used to represent substantial amounts of information automatically extracted from MRDs.