Steve J. Young | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Steve J. Young is active.

Explore More

Publication

Featured researches published by Steve J. Young.

human language technology | 1994

Tree-based state tying for high accuracy acoustic modelling

Steve J. Young; Julian J. Odell; Philip C. Woodland

The key problem to be faced when building a HMM-based continuous speech recogniser is maintaining the balance between model complexity and available training data. For large vocabulary systems requiring cross-word context dependent modelling, this is particularly acute since many such contexts will never occur in the training data. This paper describes a method of creating a tied-state continuous speech recognition system using a phonetic decision tree. This tree-based clustering is shown to lead to similar recognition performance to that obtained using an earlier data-driven approach but to have the additional advantage of providing a mapping for unseen triphones. State-tying is also compared with traditional model-based tying and shown to be clearly superior. Experimental results are presented for both the Resource Management and Wall Street Journal tasks.

Computer Speech & Language | 1991

Applications of stochastic context-free grammars using the Inside-Outside algorithm

K. Lari; Steve J. Young

Abstract This paper describes two applications in speech recognition of the use of stochastic context-free grammars (SCFGs) trained automatically via the Inside-Outside Algorithm. First, SCFGs are used to model VQ encoded speech for isolated word recognition and are compared directly to HMMs used for the same task. It is shown that SCFGs can model this low-level VQ data accurately and that a regular grammar based pre-training algorithm is effective both for reducing training time and obtaining robust solutions. Second, an SCFG is inferred from a transcription of the speech used to train a phoneme-based recognizer in an attempt to model phonotactic constraints. When used as a language model, this SCFG gives improved performance over a comparable regular grammar or bigram.

Computer Speech & Language | 2007

Partially observable Markov decision processes for spoken dialog systems

Jason D. Williams; Steve J. Young

In a spoken dialog system, determining which action a machine should take in a given situation is a difficult problem because automatic speech recognition is unreliable and hence the state of the conversation can never be known with certainty. Much of the research in spoken dialog systems centres on mitigating this uncertainty and recent work has focussed on three largely disparate techniques: parallel dialog state hypotheses, local use of confidence scores, and automated planning. While in isolation each of these approaches can improve action selection, taken together they currently lack a unified statistical framework that admits global optimization. In this paper we cast a spoken dialog system as a partially observable Markov decision process (POMDP). We show how this formulation unifies and extends existing techniques to form a single principled framework. A number of illustrations are used to show qualitatively the potential benefits of POMDPs compared to existing techniques, and empirical results from dialog simulations are presented which demonstrate significant quantitative gains. Finally, some of the key challenges to advancing this method - in particular scalability - are briefly outlined.

IEEE Transactions on Speech and Audio Processing | 1996

Robust continuous speech recognition using parallel model combination

Mark J. F. Gales; Steve J. Young

This paper addresses the problem of automatic speech recognition in the presence of interfering noise. It focuses on the parallel model combination (PMC) scheme, which has been shown to be a powerful technique for achieving noise robustness. Most experiments reported on PMC to date have been on small, 10-50 word vocabulary systems. Experiments on the Resource Management (RM) database, a 1000 word continuous speech recognition task, reveal compensation requirements not highlighted by the smaller vocabulary tasks. In particular, that it is necessary to compensate the dynamic parameters as well as the static parameters to achieve good recognition performance. The database used for these experiments was the RM speaker independent task with either Lynx Helicopter noise or Operation Room noise from the NOISEX-92 database added. The experiments reported here used the HTK RM recognizer developed at CUED modified to include PMC based compensation for the static, delta and delta-delta parameters. After training on clean speech data, the performance of the recognizer was found to be severely degraded when noise was added to the speech signal at between 10 and 18 dB. However, using PMC the performance was restored to a level comparable with that obtained when training directly in the noise corrupted environment.

Computer Speech & Language | 2010

The Hidden Information State model: A practical framework for POMDP-based spoken dialogue management

Steve J. Young; Milica Gasic; Simon Keizer; François Mairesse; Jost Schatzmann; Blaise Thomson; Kai Yu

This paper explains how Partially Observable Markov Decision Processes (POMDPs) can provide a principled mathematical framework for modelling the inherent uncertainty in spoken dialogue systems. It briefly summarises the basic mathematics and explains why exact optimisation is intractable. It then describes in some detail a form of approximation called the Hidden Information State model which does scale and which can be used to build practical systems. A prototype HIS system for the tourist information domain is evaluated and compared with a baseline MDP system using both user simulations and a live user trial. The results give strong support to the central contention that the POMDP-based framework is both a tractable and powerful approach to building more robust spoken dialogue systems.

Speech Communication | 2000

Phone-level pronunciation scoring and assessment for interactive language learning

Silke M. Witt; Steve J. Young

This paper investigates a method of automatic pronunciation scoring for use in computer-assisted language learning (CALL) systems. The method utilises a likelihood-based ‘Goodness of Pronunciation’ (GOP) measure which is extended to include individual thresholds for each phone based on both averaged native confidence scores and on rejection statistics provided by human judges. Further improvements are obtained by incorporating models of the subject’s native language and by augmenting the recognition networks to include expected pronunciation errors. The various GOP measures are assessed using a specially recorded database of non-native speakers which has been annotated to mark phone-level pronunciation errors. Since pronunciation assessment is highly subjective, a set of four performance measures has been designed, each of them measuring diAerent aspects of how well computer-derived phone-level scores agree with human scores. These performance measures are used to cross-validate the reference annotations and to assess the basic GOP algorithm and its refinements. The experimental results suggest that a likelihood-based pronunciation scoring metric can achieve usable performance, especially after applying the various enhancements. ” 2000 Elsevier Science B.V. All rights reserved.

Image and Vision Computing | 1994

HMM-based architecture for face identification

Ferdinando Samaria; Steve J. Young

Abstract This paper describes an approach to the problem of face identification which uses Hidden Markov Models (HMM) to represent the statistics of facial images. HMMs have previously been used with considerable success in speech recognition. Here we describe how two-dimensional face images can be converted into one-dimensional sequences to allow similar techniques to be applied. We investigate the factors that affect the choice of model type and model parameters. We show how a HMM can be used to automatically segment face images and extract features that can be used for identification. Successful results are obtained when facial expression, face details and lighting vary. Small head orientation changes are also tolerated. Experiments are described which assess the performance of the HMM-based approach and the results are compared with the well-known Eigenface method. For the given test set of 50 images, the HMM approach performs favourably. We conclude by summarizing the benefits of using HMMs in this area, and indicate future directions of work.

Proceedings of the IEEE | 2013

POMDP-Based Statistical Spoken Dialog Systems: A Review

Steve J. Young; Milica Gasic; Blaise Thomson; Jason D. Williams

Statistical dialog systems (SDSs) are motivated by the need for a data-driven framework that reduces the cost of laboriously handcrafting complex dialog managers and that provides robustness against the errors created by speech recognizers operating in noisy environments. By including an explicit Bayesian model of uncertainty and by optimizing the policy via a reward-driven process, partially observable Markov decision processes (POMDPs) provide such a framework. However, exact model representation and optimization is computationally intractable. Hence, the practical application of POMDP-based systems requires efficient algorithms and carefully constructed approximations. This review article provides an overview of the current state of the art in the development of POMDP-based spoken dialog systems.

Knowledge Engineering Review | 2006

A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies

Jost Schatzmann; Karl Weilhammer; Matthew N. Stuttle; Steve J. Young

Within the broad field of spoken dialogue systems, the application of machine-learning approaches to dialogue management strategy design is a rapidly growing research area. The main motivation is the hope of building systems that learn through trial-and-error interaction what constitutes a good dialogue strategy. Training of such systems could in theory be done using human users or using corpora of human–computer dialogue, but in practice the typically vast space of possible dialogue states and strategies cannot be explored without the use of automatic user simulation tools.This requirement for training statistical dialogue models has created an interesting new application area for predictive statistical user modelling and a variety of different techniques for simulating user behaviour have been presented in the literature ranging from simple Markov models to Bayesian networks. The development of reliable user simulation tools is critical to further progress on automatic dialogue management design but it holds many challenges, some of which have been encountered in other areas of current research on statistical user modelling, such as the problem of ‘concept drift’, the problem of combining content-based and collaboration-based modelling techniques, and user model evaluation. The latter topic is of particular interest, because simulation-based learning is currently one of the few applications of statistical user modelling that employs both direct ‘accuracy-based’ and indirect ‘utility-based’ evaluation techniques.In this paper, we briefly summarize the role of the dialogue manager in a spoken dialogue system, give a short introduction to reinforcement-learning of dialogue management strategies and review the literature on user modelling for simulation-based strategy learning. We further describe recent work on user model evaluation and discuss some of the current research issues in simulation-based learning from a user modelling perspective.

Computer Speech & Language | 2010

Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems

Blaise Thomson; Steve J. Young

This paper describes a statistically motivated framework for performing real-time dialogue state updates and policy learning in a spoken dialogue system. The framework is based on the partially observable Markov decision process (POMDP), which provides a well-founded, statistical model of spoken dialogue management. However, exact belief state updates in a POMDP model are computationally intractable so approximate methods must be used. This paper presents a tractable method based on the loopy belief propagation algorithm. Various simplifications are made, which improve the efficiency significantly compared to the original algorithm as well as compared to other POMDP-based dialogue state updating approaches. A second contribution of this paper is a method for learning in spoken dialogue systems which uses a component-based policy with the episodic Natural Actor Critic algorithm. The framework proposed in this paper was tested on both simulations and in a user trial. Both indicated that using Bayesian updates of the dialogue state significantly outperforms traditional definitions of the dialogue state. Policy learning worked effectively and the learned policy outperformed all others on simulations. In user trials the learned policy was also competitive, although its optimality was less conclusive. Overall, the Bayesian update of dialogue state framework was shown to be a feasible and effective approach to building real-world POMDP-based dialogue systems.

Explore More