Catherine Breslin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Catherine Breslin is active.

Explore More

Publication

Featured researches published by Catherine Breslin.

international conference on acoustics, speech, and signal processing | 2013

On-line policy optimisation of Bayesian spoken dialogue systems via human interaction

Milica Gasic; Catherine Breslin; Matthew Henderson; Dongho Kim; Martin Szummer; Blaise Thomson; Pirros Tsiakoulis; Steve J. Young

A partially observable Markov decision process has been proposed as a dialogue model that enables robustness to speech recognition errors and automatic policy optimisation using reinforcement learning (RL). However, conventional RL algorithms require a very large number of dialogues, necessitating a user simulator. Recently, Gaussian processes have been shown to substantially speed up the optimisation, making it possible to learn directly from interaction with human users. However, early studies have been limited to very low dimensional spaces and the learning has exhibited convergence problems. Here we investigate learning from human interaction using the Bayesian Update of Dialogue State system. This dynamic Bayesian network based system has an optimisation space covering more than one hundred features, allowing a wide range of behaviours to be learned. Using an improved policy model and a more robust reward function, we show that stable learning can be achieved that significantly outperforms a simulator trained policy.

international conference on acoustics, speech, and signal processing | 2007

Complementary System Generation using Directed Decision Trees

Catherine Breslin; Mark J. F. Gales

Large vocabulary continuous speech recognition (LVCSR) systems often use a multi-pass decoding strategy with a combination of multiple systems in the final stage. To reduce the error rate, these models must be complementary, i.e. make different errors. Previously, complementary systems have been generated by independently training a number of models, explicitly performing all combinations and picking the best performance. This method becomes infeasible as the potential number of systems increases, and does not guarantee that any of the models will be complementary. This paper presents an algorithm for generating complementary systems by altering the decision tree generation. Confusions made by a baseline system are resolved by separating confusable states, which might previously have been clustered together using the standard decision tree algorithm. Experimental results presented on a broadcast news Mandarin task show gains when combining the baseline with a complementary directed decision tree system.

international conference on acoustics, speech, and signal processing | 2011

Rapid joint speaker and noise compensation for robust speech recognition

K. K. Chin; Haitian Xu; Mark J. F. Gales; Catherine Breslin; Katherine Mary Knill

For speech recognition, mismatches between training and testing for speaker and noise are normally handled separately. The work presented in this paper aims at jointly applying speaker adaptation and model-based noise compensation by embedding speaker adaptation as part of the noise mismatch function. The proposed method gives a faster and more optimum adaptation compared to compensating for these two factors separately. It is also more consistent with respect to the basic assumptions of speaker and noise adaptation. Experimental results show significant and consistent gains from the proposed method.

Speech Communication | 2009

Directed decision trees for generating complementary systems

Catherine Breslin; Mark J. F. Gales

Many large vocabulary continuous speech recognition systems use a combination of multiple systems to obtain the final hypothesis. These complementary systems are typically found in an ad-hoc manner, by testing combinations of diverse systems and selecting the best. This paper presents a new algorithm for generating complementary systems by altering the decision tree generation, and a divergence measure for comparing decision trees. In this paper, the decision tree is biased against clustering states which have previously led to confusions. This leads to a system which concentrates states in contexts that were previously confusable. Thus these systems tend to make different errors. Results are presented on two broadcast news tasks - Mandarin and Arabic. The results show that combining multiple systems built from directed decision trees give gains in performance when confusion network combination is used as the method of combination. The results also show that the gains achieved using the directed tree algorithm are additive to the gains achieved using other techniques that have been empirically shown as complementary.

Archive | 2016

Evaluation of Statistical POMDP-Based Dialogue Systems in Noisy Environments

Steve J. Young; Catherine Breslin; Milica Gasic; Matthew Henderson; Dongho Kim; Martin Szummer; Blaise Thomson; Pirros Tsiakoulis; Eli Tzirkel Hancock

Compared to conventional hand-crafted rule-based dialogue management systems, statistical POMDP-based dialogue managers offer the promise of increased robustness, reduced development and maintenance costs, and scaleability to large open-domains. As a consequence, there has been considerable research activity in approaches to statistical spoken dialogue systems over recent years. However, building and deploying a real-time spoken dialogue system is expensive, and even when operational, it is hard to recruit sufficient users to get statistically significant results. Instead, researchers have tended to evaluate using user simulators or by reprocessing existing corpora, both of which are unconvincing predictors of actual real world performance. This paper describes the deployment of a real-world restaurant information system and its evaluation in a motor car using subjects recruited locally and by remote users recruited using Amazon Mechanical Turk. The paper explores three key questions: are statistical dialogue systems more robust than conventional hand-crafted systems; how does the performance of a system evaluated on a user simulator compare to performance with real users; and can performance of a system tested over the telephone network be used to predict performance in more hostile environments such as a motor car? The results show that the statistical approach is indeed more robust, but results from a simulator significantly over-estimate performance both absolute and relative. Finally, by matching WER rates, performance results obtained over the telephone can provide useful predictors of performance in noisier environments such as the motor car, but again they tend to over-estimate performance.

international conference on acoustics, speech, and signal processing | 2014

Dialogue context sensitive HMM-based speech synthesis

Pirros Tsiakoulis; Catherine Breslin; Milica Gasic; Matthew Henderson; Dongho Kim; Martin Szummer; Blaise Thomson; Steve J. Young

The focus of this work is speech synthesis tailored to the needs of spoken dialogue systems. More specifically, the framework of HMM-based speech synthesis is utilized to train an emphatic voice that also considers dialogue context for decision tree state clustering. To achieve this, we designed and recorded a speech corpus comprising system prompts from human-computer interaction, as well as additional prompts for slot-level emphasis. This corpus, combined with a general purpose text-to-speech one, was used to train voices using a) baseline context features, b) additional emphasis features, and c) additional dialogue context features. Both emphasis and dialogue context features are extracted from the dialogue act semantic representation. The voices were evaluated in pairs for dialogue appropriateness using a preference listening test. The results show that the emphatic voice is preferred to the baseline when emphasis markup is present, while the dialogue context-sensitive voice is preferred to the plain emphatic one when no emphasis markup is present and preferable to the baseline in both cases. This demonstrates that including dialogue context features for decision tree state clustering significantly improves the quality of the synthetic voice for dialogue.

international conference on acoustics, speech, and signal processing | 2013

Continuous asr for flexible incremental dialogue

Catherine Breslin; Milica Gasic; Matthew Henderson; Dongho Kim; Martin Szummer; Blaise Thomson; Pirros Tsiakoulis; Steve J. Young

Spoken dialogue systems provide a convenient way for users to interact with a machine using only speech. However, they often rely on a rigid turn taking regime in which a voice activity detection (VAD) module is used to determine when the user is speaking and decide when is an appropriate time for the system to respond. This paper investigates replacing the VAD and discrete utterance recogniser of a conventional turn-taking system with a continuously operating recogniser that is always listening, and using the recogniser 1-best path to guide turn taking. In this way, a flexible framework for incremental dialogue management is possible. Experimental results show that it is possible to remove the VAD component and successfully use the recogniser best path to identify user speech, with more robustness to noise, potentially smaller latency times, and a reduction in overall recognition error rate compared to using the conventional approach.

annual meeting of the special interest group on discourse and dialogue | 2014

The PARLANCE mobile application for interactive search in English and Mandarin

Helen Hastie; Marie-Aude Aufaure; Panos Alexopoulos; Hugues Bouchard; Catherine Breslin; Heriberto Cuayáhuitl; Nina Dethlefs; Milica Gasic; James Henderson; Oliver Lemon; Xingkun Liu; Peter Mika; Nesrine Ben Mustapha; Tim Potter; Verena Rieser; Blaise Thomson; Pirros Tsiakoulis; Yves Vanrompay; Boris Villazon-Terrazas; Majid Yazdani; Steve J. Young; Yanchao Yu

We demonstrate a mobile application in English and Mandarin to test and evaluate components of the Parlance dialogue system for interactive search under real-world conditions.

annual meeting of the special interest group on discourse and dialogue | 2013