K. Sparck Jones
University of Cambridge
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by K. Sparck Jones.
Journal of the Association for Information Science and Technology | 1976
Stephen E. Robertson; K. Sparck Jones
This paper examines statistical techniques for exploiting relevance information to weight search terms. These techniques are presented as a natural extension of weighting methods using information about the distribution of index terms in documents in general. A series of relevance weighting functions is derived and is justified by theoretical considerations. In particular, it is shown that specific weighted search methods are implied by a general probabilistic theory of retrieval. Different applications of relevance weighting are illustrated by experimental results for test collections.
Information Processing and Management | 2000
K. Sparck Jones; Steve Walker; Stephen E. Robertson
The paper combines a comprehensive account of the probabilistic model of retrieval with new systematic experiments on TREC Programme material. It presents the model from its foundations through its logical development to cover more aspects of retrieval data and a wider range of system functions. Each step in the argument is matched by comparative retrieval tests, to provide a single coherent account of a major line of research. The experiments demonstrate, for a large test collection, that the probabilistic model is eAective and robust, and that it responds appropriately, with major improvements in performance, to key features of retrieval situations. Part 1 covers the foundations and the model development for document collection and relevance data, along with the test apparatus. Part 2 covers the further development and elaboration of the model, with extensive testing, and briefly considers other environment conditions and tasks, model training, concluding with comparisons with other approaches and an overall assessment. Data and results tables for both parts are given in Part 1. Key results are summarised in Part 2. 7 2000 Elsevier Science Ltd. All rights reserved.
international acm sigir conference on research and development in information retrieval | 1996
Gareth J. F. Jones; J.T. Foote; K. Sparck Jones; Steve J. Young
This paper presents domain-independent methods of spoken document retrieval. Both a continuous-speech large vocabulary recognition system, and a phone-lattice word spotter, are used to locate index units within an experimental corpus of voice messages. Possible index terms are nearly unconstrained; terms not in a 20,000 word recognition system vocabulary can be identified by the word spotter at search time. Though either system alone can yield respectable retrieval performance, the two methods are complementary and work best in combination. Different ways of combining them are investigated, and it is shown that the best of these can increase retrieval average precision for a speakerindependent retrieval system to 85% of that achieved for full-text transcriptions of the test documents.
Information Processing and Management | 1979
K. Sparck Jones
Abstract Following successful initial tests of theoretically-based schemes for relevance weighting of search terms, further experiments were undertaken to validate these results. The experiments were designed to investigate weighting for a large document set, poor matching conditions, heterogeneous data, and limited relevance information, i.e. the use of weighting in more realistic conditions than the initial ones. The results confirm the earlier ones: very striking improvements in retrieval performance were obtained, especially for the theoretically best-founded weighting formula. The experiments illustrate a much more promising application of statistical methods to indexing and searching than any studied hitherto.
Information Processing and Management | 1996
K. Sparck Jones; Gareth J. F. Jones; J.T. Foote; Steve J. Young
This paper describes experiments in the retrieval of spoken documents in multimedia systems. Speech documents pose a particular problem for retrieval since their words as well as contents are unknown. The work reported addresses this problem, for a video mail application, by combining state of the art speech recognition with established document retrieval technologies so as to provide an effective and efficient retrieval tool. Tests with a small spoken message collection show that retrieval precision for the spoken file can reach 90% of that obtained when the same file is used, as a benchmark, in text transcription form.
international conference on acoustics, speech, and signal processing | 1997
Steve J. Young; M. G. Brown; J.T. Foote; Gareth J. F. Jones; K. Sparck Jones
This paper reviews the Video Mail Retrieval (VMR) project at Cambridge University and ORL. The VMR project began in September 1993 with the aim of developing methods for retrieving video documents by scanning the audio soundtrack for keywords. The project has shown, both experimentally and through the construction of a working prototype, that speech recognition can be combined with information retrieval methods to locate multimedia documents by content. The final version of the VMR system uses pre-computed phone lattices to allow extremely rapid word spotting and audio indexing, and statistical information retrieval (IR) methods to mitigate the effects of spotting errors. The net result is a retrieval system that is open-vocabulary and speaker-independent, and which can search audio orders of magnitude faster than real time.
international conference on acoustics, speech, and signal processing | 1995
Gareth J. F. Jones; J.T. Foote; K. Sparck Jones; Steve J. Young
The goal of the video mail retrieval project is to integrate state-of-the-art document retrieval methods with high accuracy word spotting to yield a robust and efficient retrieval system. This paper describes a preliminary study to determine the extent to which retrieval precision is affected by word spotting performance. It includes a description of the database design, the word spotting algorithm, and the information retrieval method used. Results are presented which show audio retrieval performance very close to that of text.
Computer Speech & Language | 1997
J.T. Foote; Steve J. Young; Gareth J. F. Jones; K. Sparck Jones
Abstract Traditional hidden Markov model (HMM) word spotting requires both explicit HMM models of each desired keyword and a computationally expensive decoding pass. For certain applications, such as audio indexing or information retrieval, conventional word spotting may be too constrained or impractically slow. This paper presents an alternative technique, where a phone lattice—representing multiple phone hypotheses—is pre-computed prior to need. Given a phone decomposition of any desired keyword, the lattice may be rapidly searched to find putative occurrences of the keyword. Though somewhat less accurate, this can be substantially faster (orders of magnitude) and more flexible (any keyword may be detected) than previous approaches. This paper presents algorithms for lattice generation and scanning, and experimental results, including comparison with conventional keyword-HMM approaches. Finally, word spotting based on phone lattice scanning is demonstrated to be effective for spoken document retrieval.
international acm sigir conference on research and development in information retrieval | 2000
Philip C. Woodland; Sue E. Johnson; P. Jourlin; K. Sparck Jones
The effects of out-of-vocabulary (OOV) items in spoken document retrieval (SDR) are investigated. Several sets of transcriptions were created for the TREC-8 SDR task using a speech recognition system varying the vocabulary sizes and OOV rates, and the relative retrieval performance measured. The effects of OOV terms on a simple baseline IR system and on more sophisticated retrieval systems are described. The use of a parallel corpus for query and document expansion is found to be especially beneficial, and with this data set, good retrieval performance can be achieved even for fairly high OOV rates.
conference on applied natural language processing | 1983
Branimir Boguraev; K. Sparck Jones
This paper describes a front and for natural language access to databases making extensive use of general, i.e. domain-independent, semantic information for question interpretation. In the interests of portability, initial syntactic and semantic processing of a question is carried out without any reference to the database domain, and domain-dependent operations are confined to subsequent, comparatively straightforward, processing of the initial interpretation. The different modules of the front end are described, and the systems performance is illustrated by examples.