J.T. Foote
University of Cambridge
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by J.T. Foote.
acm multimedia | 1995
M. G. Brown; J.T. Foote; Gareth J. F. Jones; K. Sparck Jones; Steve J. Young
Recent years have seen a rapid increase in the availability and use of multimedia applications. These systems can generate large amounts of audio and video data which can be expensive to store and unwieldy to access. The Video Mail Retrieval (VMR) project at Cambridge University and Olivetti Research Limited (ORL), Cambridge, UK, is addressing these problems by developing systems to retrieve stored video material using the spoken audio soundtrack [1, 16]. Speci cally, the project focuses on the content-based location, retrieval, and playback of potentially relevant data. The primary goal of the VMR project is to develop a video mail retrieval application for the Medusa multimedia environment developed at ORL. Previous work on the VMR project demonstrated practical retrieval of audio messages using speech recognition for content identi cation [8, 4]. Because of the limited number of available audio messages, a much larger archive of television news broadcasts (along with accompanying subtitle transcriptions) is currently being collected. This will serve as a testbed for new methods of storing and accessing large amounts of audio/video data. The enormous potential size of the news broadcast archive dramatically illustrates the need for ways of automatically nding and retrieving information from the archive. Quantitative experiments demonstrate that Information Retrieval (IR) methods developed for searching text archives can accurately retrieve multimedia data, given suitable subtitle transcriptions. In addition, the same techniques can be used to rapidly locate interesting areas within an individual news broadcast. Although large multimedia archives will be more common in the future, today they require a specialised and highperformance hardware infrastructure. The work presented here relies on the the Medusa system developed at ORL, which includes distributed, high-capacity multimedia repositories. This paper begins with an overview of the ORL Medusa technology. Subsequent sections describe the collection and storage of a BBC television broadcast news archive, a retrieval methodology for location of potentially relevant sections in response to users requests, and a graphical user interface for content-based retrieval and browsing of news f g
international acm sigir conference on research and development in information retrieval | 1996
Gareth J. F. Jones; J.T. Foote; K. Sparck Jones; Steve J. Young
This paper presents domain-independent methods of spoken document retrieval. Both a continuous-speech large vocabulary recognition system, and a phone-lattice word spotter, are used to locate index units within an experimental corpus of voice messages. Possible index terms are nearly unconstrained; terms not in a 20,000 word recognition system vocabulary can be identified by the word spotter at search time. Though either system alone can yield respectable retrieval performance, the two methods are complementary and work best in combination. Different ways of combining them are investigated, and it is shown that the best of these can increase retrieval average precision for a speakerindependent retrieval system to 85% of that achieved for full-text transcriptions of the test documents.
Information Processing and Management | 1996
K. Sparck Jones; Gareth J. F. Jones; J.T. Foote; Steve J. Young
This paper describes experiments in the retrieval of spoken documents in multimedia systems. Speech documents pose a particular problem for retrieval since their words as well as contents are unknown. The work reported addresses this problem, for a video mail application, by combining state of the art speech recognition with established document retrieval technologies so as to provide an effective and efficient retrieval tool. Tests with a small spoken message collection show that retrieval precision for the spoken file can reach 90% of that obtained when the same file is used, as a benchmark, in text transcription form.
international conference on acoustics, speech, and signal processing | 1997
Steve J. Young; M. G. Brown; J.T. Foote; Gareth J. F. Jones; K. Sparck Jones
This paper reviews the Video Mail Retrieval (VMR) project at Cambridge University and ORL. The VMR project began in September 1993 with the aim of developing methods for retrieving video documents by scanning the audio soundtrack for keywords. The project has shown, both experimentally and through the construction of a working prototype, that speech recognition can be combined with information retrieval methods to locate multimedia documents by content. The final version of the VMR system uses pre-computed phone lattices to allow extremely rapid word spotting and audio indexing, and statistical information retrieval (IR) methods to mitigate the effects of spotting errors. The net result is a retrieval system that is open-vocabulary and speaker-independent, and which can search audio orders of magnitude faster than real time.
international conference on acoustics, speech, and signal processing | 1995
Gareth J. F. Jones; J.T. Foote; K. Sparck Jones; Steve J. Young
The goal of the video mail retrieval project is to integrate state-of-the-art document retrieval methods with high accuracy word spotting to yield a robust and efficient retrieval system. This paper describes a preliminary study to determine the extent to which retrieval precision is affected by word spotting performance. It includes a description of the database design, the word spotting algorithm, and the information retrieval method used. Results are presented which show audio retrieval performance very close to that of text.
Computer Speech & Language | 1997
J.T. Foote; Steve J. Young; Gareth J. F. Jones; K. Sparck Jones
Abstract Traditional hidden Markov model (HMM) word spotting requires both explicit HMM models of each desired keyword and a computationally expensive decoding pass. For certain applications, such as audio indexing or information retrieval, conventional word spotting may be too constrained or impractically slow. This paper presents an alternative technique, where a phone lattice—representing multiple phone hypotheses—is pre-computed prior to need. Given a phone decomposition of any desired keyword, the lattice may be rapidly searched to find putative occurrences of the keyword. Though somewhat less accurate, this can be substantially faster (orders of magnitude) and more flexible (any keyword may be detected) than previous approaches. This paper presents algorithms for lattice generation and scanning, and experimental results, including comparison with conventional keyword-HMM approaches. Finally, word spotting based on phone lattice scanning is demonstrated to be effective for spoken document retrieval.
international conference on acoustics speech and signal processing | 1996
Gareth J. F. Jones; J.T. Foote; K. Spark Jones; Steve J. Young
The goal of the video mail retrieval (VMR) project is to integrate state-of-the-art document retrieval methods with speech recognition to yield a robust and efficient retrieval system. The work presented extends VMR towards an open-vocabulary, talker-independent system for retrieving spontaneously-spoken audio and video messages. We present results showing successful retrieval using a standard large-vocabulary (LV) recogniser, despite the lack of a matched language model and vocabulary. We further show that integrating a LV recogniser with conventional word spotting (WS) gives more robust retrieval performance than either method alone. This paper gives details of the message archive used, the speech recognition methodologies, the information retrieval methods, and experimental results.
international conference on acoustics, speech, and signal processing | 1995
J.T. Foote
This paper presents a method of non-parametrically modeling HMM output probabilities. Discrete output probabilities are estimated from a tree-based maximum mutual information (MMI) partition of the feature space, rather than the usual vector quantization. One advantage of a decision-tree method is that very high-dimensional spaces can be partitioned. Time variation can then be explicitly modeled by concatenating time-adjacent vectors, which is shown to improve recognition performance. Though the model is discrete, it provides recognition performance better than i-component Gaussian mixture HMMs on the ARPA Resource Management (RM) task. This method is not without drawbacks: because of its non-parametric nature, a large number of parameters are needed for a good model and the available RM training data is probably not sufficient. Besides the computational advantages of a discrete model, this method has promising applications in talker identification, adaptation, and clustering.
conference of the international speech communication association | 1995
J.T. Foote; Gareth J. F. Jones; Karen Sparck Jones; Steve J. Young
MM | 1994
Martin Brown; J.T. Foote; Gareth J. F. Jones; Karen Sparck Jones; Steve J. Young