Julian J. Odell | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Julian J. Odell is active.

Explore More

Publication

Featured researches published by Julian J. Odell.

human language technology | 1994

Tree-based state tying for high accuracy acoustic modelling

Steve J. Young; Julian J. Odell; Philip C. Woodland

The key problem to be faced when building a HMM-based continuous speech recogniser is maintaining the balance between model complexity and available training data. For large vocabulary systems requiring cross-word context dependent modelling, this is particularly acute since many such contexts will never occur in the training data. This paper describes a method of creating a tied-state continuous speech recognition system using a phonetic decision tree. This tree-based clustering is shown to lead to similar recognition performance to that obtained using an earlier data-driven approach but to have the additional advantage of providing a mapping for unseen triphones. State-tying is also compared with traditional model-based tying and shown to be clearly superior. Experimental results are presented for both the Resource Management and Wall Street Journal tasks.

Speech Communication | 1997

MMIE training of large vocabulary recognition systems

Valtcho Valtchev; Julian J. Odell; Philip C. Woodland; Steve J. Young

Abstract This paper describes a framework for optimising the structure and parameters of a continuous density HMM-based large vocabulary recognition system using the Maximum Mutual Information Estimation (MMIE) criterion. To reduce the computational complexity of the MMIE training algorithm, confusable segments of speech are identified and stored as word lattices of alternative utterance hypotheses. An iterative mixture splitting procedure is also employed to adjust the number of mixture components in each state during training such that the optimal balance between the number of parameters and the available training data is achieved. Experiments are presented on various test sets from the Wall Street Journal database using up to 66 hours of acoustic training data. These demonstrate that the use of lattices makes MMIE training practicable for very complex recognition systems and large training sets. Furthermore, the experimental results show that MMIE optimisation of system structure and parameters can yield useful increases in recognition accuracy.

human language technology | 1994

A one pass decoder design for large vocabulary recognition

Julian J. Odell; Valentin Valtchev; Philip C. Woodland; Steve J. Young

To achieve reasonable accuracy in large vocabulary speech recognition systems, it is important to use detailed acoustic models together with good long span language models. For example, in the Wall Street Journal (WSJ) task both cross-word triphones and a trigram language model are necessary to achieve state-of-the-art performance. However, when using these models, the size of a pre-compiled recognition network can make a standard Viterbi search infeasible and hence, either multiple-pass or asynchronous stack decoding schemes are typically used. In this paper, we show that time-synchronous one-pass decoding using cross-word triphones and a trigram language model can be implemented using a dynamically built tree-structured network. This approach avoids the compromises inherent in using fast-matches or preliminary passes and is relatively efficient in implementation. It was included in the HTK large vocabulary speech recognition system used for the 1993 ARPA WSJ evaluation and experimental results are presented for that task.

international conference on acoustics, speech, and signal processing | 1995

The 1994 HTK large vocabulary speech recognition system

Philip C. Woodland; Chris Leggetter; Julian J. Odell; Valtcho Valtchev; Steve J. Young

This paper describes recent work on the HTK large vocabulary speech recognition system. The system uses tied-state cross-word context-dependent mixture Gaussian HMMs and a dynamic network decoder that can operate in a single pass. In the last year the decoder has been extended to produce word lattices to allow flexible and efficient system development, as well as multi-pass operation for use with computationally expensive acoustic and/or language models. The system vocabulary can now be up to 65 k words, the final acoustic models have been extended to be sensitive to more acoustic context (quinphones), a 4-gram language model has been used and unsupervised incremental speaker adaptation incorporated. The resulting system gave the lowest error rates on both the H1-P0 and H1-C1 hub tasks in the November 1994 ARPA CSR evaluation.

international conference on acoustics speech and signal processing | 1996

Lattice-based discriminative training for large vocabulary speech recognition

Valtcho Valtchev; Julian J. Odell; Philip C. Woodland; Steve J. Young

This paper describes a framework for optimising the parameters of a continuous density HMM-based large vocabulary recognition system using a maximum mutual information estimation (MMIE) criterion. To limit the computational complexity arising from the need to find confusable speech segments in the large search space of alternative utterance hypotheses, word lattices generated from the training data are used. Experiments are presented on the Wall Street journal database using up to 66 hours of training data. These show that lattices combined with an improved estimation algorithm makes MMIE training practicable even for very complex recognition systems and large training sets. Furthermore, experimental results show that MMIE training can yield useful increases in recognition accuracy.

international conference on speech image processing and neural networks | 1994

Tree-based state clustering for large vocabulary speech recognition

Julian J. Odell; Philip C. Woodland; Steve J. Young

The key problem to be faced when building a HMM-based continuous speech recogniser is maintaining the balance between model complexity and available training data. For large vocabulary systems requiring cross-word context dependent modelling, this is particularly acute since many such contexts will never occur in the training data. This paper describes a method of creating a tied-state continuous speech recognition system using a phonetic decision tree. Results are presented for the Resource Management and Wall Street Journal tasks where very good performance is achieved. The method is compared to a traditional model-based procedure and shown to be clearly superior.<<ETX>>

Archive | 1995

The HTK book

Steve J. Young; J Jansen; Julian J. Odell; Dg Ollason; Philip C. Woodland

Archive | 2006

The HTK book version 3.4

Steve J. Young; Gunnar Evermann; Mjf Gales; Danny Kershaw; Gareth L. Moore; Julian J. Odell; David G. Ollason; Daniel Povey; Valtchev; Philip C. Woodland

Archive | 1995