K. Sparck Jones | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where K. Sparck Jones is active.

Explore More

Publication

Featured researches published by K. Sparck Jones.

Journal of the Association for Information Science and Technology | 1976

Relevance weighting of search terms

Stephen E. Robertson; K. Sparck Jones

This paper examines statistical techniques for exploiting relevance information to weight search terms. These techniques are presented as a natural extension of weighting methods using information about the distribution of index terms in documents in general. A series of relevance weighting functions is derived and is justified by theoretical considerations. In particular, it is shown that specific weighted search methods are implied by a general probabilistic theory of retrieval. Different applications of relevance weighting are illustrated by experimental results for test collections.

Information Processing and Management | 2000

A probabilistic model of information retrieval: development and comparative experiments

K. Sparck Jones; Steve Walker; Stephen E. Robertson

The paper combines a comprehensive account of the probabilistic model of retrieval with new systematic experiments on TREC Programme material. It presents the model from its foundations through its logical development to cover more aspects of retrieval data and a wider range of system functions. Each step in the argument is matched by comparative retrieval tests, to provide a single coherent account of a major line of research. The experiments demonstrate, for a large test collection, that the probabilistic model is eAective and robust, and that it responds appropriately, with major improvements in performance, to key features of retrieval situations. Part 1 covers the foundations and the model development for document collection and relevance data, along with the test apparatus. Part 2 covers the further development and elaboration of the model, with extensive testing, and briefly considers other environment conditions and tasks, model training, concluding with comparisons with other approaches and an overall assessment. Data and results tables for both parts are given in Part 1. Key results are summarised in Part 2. 7 2000 Elsevier Science Ltd. All rights reserved.

international acm sigir conference on research and development in information retrieval | 1996

Retrieving spoken documents by combining multiple index sources

Gareth J. F. Jones; J.T. Foote; K. Sparck Jones; Steve J. Young

This paper presents domain-independent methods of spoken document retrieval. Both a continuous-speech large vocabulary recognition system, and a phone-lattice word spotter, are used to locate index units within an experimental corpus of voice messages. Possible index terms are nearly unconstrained; terms not in a 20,000 word recognition system vocabulary can be identified by the word spotter at search time. Though either system alone can yield respectable retrieval performance, the two methods are complementary and work best in combination. Different ways of combining them are investigated, and it is shown that the best of these can increase retrieval average precision for a speakerindependent retrieval system to 85% of that achieved for full-text transcriptions of the test documents.

Information Processing and Management | 1979

Experiments in Relevance Weighting of Search Terms.

K. Sparck Jones

Abstract Following successful initial tests of theoretically-based schemes for relevance weighting of search terms, further experiments were undertaken to validate these results. The experiments were designed to investigate weighting for a large document set, poor matching conditions, heterogeneous data, and limited relevance information, i.e. the use of weighting in more realistic conditions than the initial ones. The results confirm the earlier ones: very striking improvements in retrieval performance were obtained, especially for the theoretically best-founded weighting formula. The experiments illustrate a much more promising application of statistical methods to indexing and searching than any studied hitherto.

Information Processing and Management | 1996

Experiments in spoken document retrieval

K. Sparck Jones; Gareth J. F. Jones; J.T. Foote; Steve J. Young

This paper describes experiments in the retrieval of spoken documents in multimedia systems. Speech documents pose a particular problem for retrieval since their words as well as contents are unknown. The work reported addresses this problem, for a video mail application, by combining state of the art speech recognition with established document retrieval technologies so as to provide an effective and efficient retrieval tool. Tests with a small spoken message collection show that retrieval precision for the spoken file can reach 90% of that obtained when the same file is used, as a benchmark, in text transcription form.

international conference on acoustics, speech, and signal processing | 1997

Acoustic indexing for multimedia retrieval and browsing

Steve J. Young; M. G. Brown; J.T. Foote; Gareth J. F. Jones; K. Sparck Jones

This paper reviews the Video Mail Retrieval (VMR) project at Cambridge University and ORL. The VMR project began in September 1993 with the aim of developing methods for retrieving video documents by scanning the audio soundtrack for keywords. The project has shown, both experimentally and through the construction of a working prototype, that speech recognition can be combined with information retrieval methods to locate multimedia documents by content. The final version of the VMR system uses pre-computed phone lattices to allow extremely rapid word spotting and audio indexing, and statistical information retrieval (IR) methods to mitigate the effects of spotting errors. The net result is a retrieval system that is open-vocabulary and speaker-independent, and which can search audio orders of magnitude faster than real time.

international conference on acoustics, speech, and signal processing | 1995

Video mail retrieval: the effect of word spotting accuracy on precision

Gareth J. F. Jones; J.T. Foote; K. Sparck Jones; Steve J. Young

The goal of the video mail retrieval project is to integrate state-of-the-art document retrieval methods with high accuracy word spotting to yield a robust and efficient retrieval system. This paper describes a preliminary study to determine the extent to which retrieval precision is affected by word spotting performance. It includes a description of the database design, the word spotting algorithm, and the information retrieval method used. Results are presented which show audio retrieval performance very close to that of text.

Computer Speech & Language | 1997

Unconstrained keyword spotting using phone lattices with application to spoken document retrieval

J.T. Foote; Steve J. Young; Gareth J. F. Jones; K. Sparck Jones

Abstract Traditional hidden Markov model (HMM) word spotting requires both explicit HMM models of each desired keyword and a computationally expensive decoding pass. For certain applications, such as audio indexing or information retrieval, conventional word spotting may be too constrained or impractically slow. This paper presents an alternative technique, where a phone lattice—representing multiple phone hypotheses—is pre-computed prior to need. Given a phone decomposition of any desired keyword, the lattice may be rapidly searched to find putative occurrences of the keyword. Though somewhat less accurate, this can be substantially faster (orders of magnitude) and more flexible (any keyword may be detected) than previous approaches. This paper presents algorithms for lattice generation and scanning, and experimental results, including comparison with conventional keyword-HMM approaches. Finally, word spotting based on phone lattice scanning is demonstrated to be effective for spoken document retrieval.

international acm sigir conference on research and development in information retrieval | 2000

Effects of out of vocabulary words in spoken document retrieval (poster session)

Philip C. Woodland; Sue E. Johnson; P. Jourlin; K. Sparck Jones

The effects of out-of-vocabulary (OOV) items in spoken document retrieval (SDR) are investigated. Several sets of transcriptions were created for the TREC-8 SDR task using a speech recognition system varying the vocabulary sizes and OOV rates, and the relative retrieval performance measured. The effects of OOV terms on a simple baseline IR system and on more sophisticated retrieval systems are described. The use of a parallel corpus for query and document expansion is found to be especially beneficial, and with this data set, good retrieval performance can be achieved even for fairly high OOV rates.

conference on applied natural language processing | 1983

HOW TO DRIVE A DATABASE FRONT DID USING GENERAL SEMANTIC INFORMATION

Branimir Boguraev; K. Sparck Jones

This paper describes a front and for natural language access to databases making extensive use of general, i.e. domain-independent, semantic information for question interpretation. In the interests of portability, initial syntactic and semantic processing of a question is carried out without any reference to the database domain, and domain-dependent operations are confined to subsequent, comparatively straightforward, processing of the initial interpretation. The different modules of the front end are described, and the systems performance is illustrated by examples.

Explore More