Tony Robinson
University of Cambridge
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tony Robinson.
international conference on acoustics, speech, and signal processing | 1995
Tony Robinson; Jeroen Fransen; David Pye; Jonathan T. Foote; Steve Renals
A significant new speech corpus of British English has been recorded at Cambridge University. Derived from the Wall Street Journal text corpus, WSJCAMO constitutes one of the largest corpora of spoken British English currently in existence. It has been specifically designed for the construction and evaluation of speaker-independent speech recognition systems. The database consists of 140 speakers each speaking about 110 utterances. This paper describes the motivation for the corpus, the processes undertaken in its construction and the utilities needed as support tools. All utterance transcriptions have been verified and a phonetic dictionary has been developed to cover the training data and evaluation tasks. Two evaluation tasks have been defined using standard 5000 word bigram and 20000 word trigram language models. The paper concludes with comparative results on these tasks for British and American English.
Computer Speech & Language | 1991
Tony Robinson; Frank Fallside
Abstract This paper describes a speaker-independent phoneme and word recognition system based on a recurrent error propagation network (REPN) trained on the TIMIT database. The REPN is a fully recurrent error propagation network trained by the propagation of the gradient signal backwards in time. A variation of the stochastic gradient descent procedure is used which updates the weights by an adaptive step size in the direction given by the sign of the gradient. Phonetic context is stored internal to the network and the outputs are estimates of the probability that a given frame is part of a segment labelled with a context-independent phonetic symbol. During recognition, a dynamic programming match is made to find the most probable string of symbols. The one pass algorithm is used for phoneme and word recognition. The phoneme recognition rate for all 61 TIMIT symbols is 70·0% correct (63·5% accuracy including insertion errors) and on a reduced 39-symbol set the recognition rate is 76·5% correct (69·8%). This compares favourably with the results of other methods, such as HMMs, on the same database [K. F. Lee & H. W. Hon 1989. IEEE Transactions on Acoustics, Speech and Signal Processing , 37 , 1641–1648; S. E. Levinson, M. Y. Liberman, A. Ljolje & L. G. Miller 1989. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing . Glasgow, pp. 441–444]. Analysis of the phoneme recognition results shows that information available from bigram and durational constraints is adequately handled within the network allowing for efficient parsing of the network output. For comparison, there is less computation involved in the resulting scheme than in a one-state-per-phoneme HMM system. This is demonstrated by applying the recognizer to the DARPA 1000-word resource management task. Parsing the network output to the word level with no grammar and no pruning can be carried out in faster than real time on a SUN 4 330 workstation.
Archive | 1996
Tony Robinson; Mike Hochberg; Steve Renals
This chapter describes a use of recurrent neural networks (i.e., feedback is incorporated in the computation) as an acoustic model for continuous speech recognition. The form of the recurrent neural network is described along with an appropriate parameter estimation procedure. For each frame of acoustic data, the recurrent network generates an estimate of the posterior probability of of the possible phones given the observed acoustic signal. The posteriors are then converted into scaled likelihoods and used as the observation probabilities within a conventional decoding paradigm (e.g., Viterbi decoding). The advantages of using recurrent networks are that they require a small number of parameters and provide a fast decoding capability (relative to conventional, large-vocabulary, HMM systems)3.
Speech Communication | 2000
Steve Renals; Dave Abberley; David Kirby; Tony Robinson
Abstract This paper describes a spoken document retrieval (SDR) system for British and North American Broadcast News. The system is based on a connectionist large vocabulary speech recognizer and a probabilistic information retrieval (IR) system. We discuss the development of a real-time Broadcast News speech recognizer, and its integration into an SDR system. Two advances were made for this task: automatic segmentation and statistical query expansion using a secondary corpus. Precision and recall results using the Text Retrieval Conference (TREC) SDR evaluation infrastructure are reported throughout the paper, and we discuss the application of these developments to a large scale SDR task based on an archive of British English broadcast news.
international conference on spoken language processing | 1996
Parham Zolfaghari; Tony Robinson
The paper describes a new formant analysis technique whereby the formant parameters are represented in the form of Gaussian mixture distributions. These are estimated from the discrete Fourier transform (DFT) magnitude spectrum of the speech signal. The parameters obtained are the means, variances and the masses of the density functions, which are used to calculate centre frequencies, bandwidths and amplitudes of formants within the spectrum. In order to better fit the mixture distributions various modifications to the DFT magnitude spectrum, based on simple models of perception, were investigated. These include reduction of dynamic range, cepstral smoothing, use of the Mel scale and pre-emphasis of speech. Results are presented for these as well as formant tracks from analysing speech using the final formant analysis system.
international conference on acoustics, speech, and signal processing | 1994
Tony Robinson; Mike Hochberg; Steve Renals
This paper describes phone modelling improvements to the hybrid connectionist-hidden Markov model speech recognition system developed at Cambridge University. These improvements are applied to phone recognition from the TIMIT task and word recognition from the Wall Street Journal (WSJ) task. A recurrent net is used to map acoustic vectors to posterior probabilities of phone classes. The maximum likelihood phone or word string is then extracted using Markov models. The paper describes three improvements: connectionist model merging; explicit presentation of acoustic context; and improved duration modelling. The first is shown to provide a significant improvement in the TIMIT phone recognition rate and all three provide an improvement in the WSJ word recognition rate.<<ETX>>
international conference on acoustics speech and signal processing | 1998
Tony Robinson; James Christie
This paper describes a new search technique for large vocabulary speech recognition based on a stack decoder. Considerable memory savings are achieved with the combination of a tree based lexicon and a new search technique. The search proceeds time-first, that is partial path hypotheses are extended into the future in the inner loop and a tree walk over the lexicon is performed as an outer loop. Partial word hypotheses are grouped based on language model state. The stack maintains information about groups of hypotheses and whole groups are extended by one word to form new stack entries. An implementation is described of a one-pass decoder employing a 65000 word lexicon and a disk-based trigram language model. Real time operation is achieved with a small search error, a search space of about 5 Mbyte and a total memory usage of about 35 Mbyte.
Proceedings of the DARPA Broadcast News Workshop, February 28-March 3, 1999, Hilton at Washington Dulles Airport, Herndon, Virginia | 1999
Gary Cook; James Christie; Daniel P. W. Ellis; Eric Fosler-Lussier; Yoshi Gotoh; Brian Kingsbury; Nelson Morgan; Steve Renals; Tony Robinson; Gethin Williams
This paper describes the SPRACH system developed for the 1998 Hub-4E broadcast news evaluation. The system is based on the connectionist-HMM framework and uses both recurrent neural network and multi-layer perceptron acoustic models. We describe both a system designed for the primary transcription hub, and a system for the less-than 10 times real-time spoke. We then describe recent developments to CHRONOS, a time-first stack decoder. We show how these developments have simplified the evaluation system, and led to significant reductions in the error rate of the 10x real-time system. We also present a system designed to operate in real-time with negligible search error.
text retrieval conference | 2000
Dave Abberley; Steve Renals; Daniel P. W. Ellis; Tony Robinson
This paper describes our participation in the TREC-9 Spoken Document Retrieval (SDR) track. The THISL SDR system consists of a realtime version of a hybrid connectionist/HMM large vocabulary speech recognition system and a probabilistic text retrieval system. This paper describes the configuration of the speech recognition and text retrieval systems, including segmentation and query expansion. We report our results for development tests using the TREC-8 queries, and for the TREC-9 evaluation.
international conference on acoustics, speech, and signal processing | 1992
Tony Robinson
A hybrid system using a connectionist model and a Markov model for the DARPA Resource Management task of large-vocabulary multiple-speaker continuous speech recognition is presented. The connectionist model uses internal feedback for context modeling and provides phone state occupancy probabilities for a simple context independent Markov model. The system has been implemented in real-time on a workstation supported by a DSP board. The use of context-independent phone models leads to the possibility of time-domain pruning and computationally efficient durational modeling, both of which are reported.<<ETX>>