Karen Sparck Jones
University of Cambridge
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Karen Sparck Jones.
Journal of Documentation | 1972
Karen Sparck Jones
The exhaustivity of document descriptions and the specificity of index terms are usually regarded as independent. It is suggested that specificity should be interpreted statistically, as a function of term use rather than of term meaning. The effects on retrieval of variations in term specificity are examined, experiments with three test collections showing, in particular, that frequently-occurring terms are required for good overall performance. It is argued that terms should be weighted according to collection frequency, so that matches on less frequent, more specific, terms are of greater value than matches on frequent terms. Results for the test collections show that considerable improvements in performance are obtained with this very simple procedure.
Information Processing and Management | 2007
Karen Sparck Jones
This paper reviews research on automatic summarising in the last decade. This work has grown, stimulated by technology and by evaluation programmes. The paper uses several frameworks to organise the review, for summarising itself, for the factors affecting summarising, for systems, and for evaluation. The review examines the evaluation strategies applied to summarising, the issues they raise, and the major programmes. It considers the input, purpose and output factors investigated in recent summarising research, and discusses the classes of strategy, extractive and non-extractive, that have been explored, illustrating the range of systems built. The conclusions drawn are that automatic summarisation has made valuable progress, with useful applications, better evaluation, and more task understanding. But summarising systems are still poorly motivated in relation to the factors affecting them, and evaluation needs taking much further to engage with the purposes summaries are intended to serve and the contexts in which they are used.
Communications of The ACM | 1996
David D. Lewis; Karen Sparck Jones
The paper summarizes the essential properties of document retrieval and reviews both conventional practice and research findings, the latter suggesting that simple statistical techniques can be effective. It then considers the new opportunities and challenges presented by the user’s ability to search full text directly (rather than e.g. titles and abstracts), and suggests appropriate approaches to doing this, with a focus on the potential role of natural language processing. The paper also comments on possible connections with data and knowledge retrieval, and concludes by emphasizing the importance of rigorous performance testing.
Information Storage and Retrieval | 1973
Karen Sparck Jones
Abstract Various approaches to index term weighting have been investigated. In particular, claims have been made for the value of statistically-based indexing in automatic retrieval systems. The paper discusses the logic of different types of weighting, and describes experiments testing weighting schemes of these types. The results show that one type of weighting leads to material performance improvements in quite different collection environments.
Knowledge Engineering Review | 1990
Ann A. Copestake; Karen Sparck Jones
This paper reviews the current state of the art in natural language access to databases. This has been a long-standing area of work in natural language processing. But though some commercial systems are now available, providing front ends has proved much harder than was expected, and the necessary limitations on front ends have to be recognized. The paper discusses the issues, both general to language and task-specific, involved in front end design, and the way these have been addressed, concentrating on the work of the last decade. The focus is on the central process of translating a natural language question into a database query, but other supporting functions are also covered. The points are illustrated by the use of a single example application. The paper concludes with an evaluation of the current state, indicating that future progress will depend on the one hand on general advances in natural language processing, and on the other on expanding the capabilities of traditional databases.
international conference on acoustics speech and signal processing | 1999
Sue E. Johnson; P. Jourlin; Gareth L. Moore; Karen Sparck Jones; Philip C. Woodland
This paper describes the spoken document retrieval system that we have been developing and assesses its performance using automatic transcriptions of about 50 hours of broadcast news data. The recognition engine is based on the HTK broadcast news transcription system and the retrieval engine is based on the techniques developed at City University. The retrieval performance over a wide range of speech transcription error rates is presented and a number of recognition error metrics that more accurately reflect the impact of transcription errors on retrieval accuracy are defined and computed. The results demonstrate the importance of high accuracy automatic transcription. The final system is currently being evaluated on the 1998 TREC-7 spoken document retrieval task.
Archive | 1999
Karen Sparck Jones
This paper addresses the value of linguistically-motivated indexing (LMI) for document and text retrieval. After reviewing the basic concepts involved and the assumptions on which LMI is based, namely that complex index descriptions and terms are necessary, I consider past and recent research on LMI, and specifically on automated LMI via NLP. Experiments in the first phase of research, to the late eighties, did not demonstrate value in LMI, but were very limited; but the much larger tests of the Nineties, with full text, have not done so either. My conclusion is that LMI is not needed for effective retrieval, but has other important roles within information-selection systems.
text retrieval conference | 1995
Karen Sparck Jones
Abstract This paper discusses the Text REtrieval Conferences (TREC) programme as a major enterprise in information retrieval research. It reviews its structure as an evaluation exercise, characterises the methods of indexing and retrieval being tested within it in terms of the approaches to system performance factors these represent; analyses the test results for solid, overall conclusions that can be drawn from them; and, in the light of the particular features of the test data, assesses TREC both for generally applicable findings that emerge from it and for directions it offers for future research.
Journal of Documentation | 2004
Karen Sparck Jones
Robertson comments on the theoretical status of IDF term weighting. Its history illustrates how ideas develop in a specific research context, in theory/experiment interaction, and in operational practice.
text retrieval conference | 2000
Karen Sparck Jones
The paper reviews the TREC Programme up to TREC-6 (1997), considering the test results, the substantive findings for IR that follow and the lessons TREC offers for IR evaluation. The paper focuses on the ad hoc retrieval task, with discussion of other test tracks as appropriate. The paper summarises the structure of the TREC work and analyses the experimental data in some detail. The analysis of the tests is presented through a series of key questions about indexing models, document and query descriptions, search strategies, etc. The assessment confirms that statistically-based methods perform as well as any, and that the nature and treatment of the users request is by far the dominant factor in performance. One implication is that TREC should move into a new phase targeted on key comparisons and task specifications designed to deliver substantive new information, in particular shifting towards situated IR that addresses the users context and contribution to searching.