Milica Gasic | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Milica Gasic is active.

Explore More

Publication

Featured researches published by Milica Gasic.

Computer Speech & Language | 2010

The Hidden Information State model: A practical framework for POMDP-based spoken dialogue management

Steve J. Young; Milica Gasic; Simon Keizer; François Mairesse; Jost Schatzmann; Blaise Thomson; Kai Yu

This paper explains how Partially Observable Markov Decision Processes (POMDPs) can provide a principled mathematical framework for modelling the inherent uncertainty in spoken dialogue systems. It briefly summarises the basic mathematics and explains why exact optimisation is intractable. It then describes in some detail a form of approximation called the Hidden Information State model which does scale and which can be used to build practical systems. A prototype HIS system for the tourist information domain is evaluated and compared with a baseline MDP system using both user simulations and a live user trial. The results give strong support to the central contention that the POMDP-based framework is both a tractable and powerful approach to building more robust spoken dialogue systems.

Proceedings of the IEEE | 2013

POMDP-Based Statistical Spoken Dialog Systems: A Review

Steve J. Young; Milica Gasic; Blaise Thomson; Jason D. Williams

Statistical dialog systems (SDSs) are motivated by the need for a data-driven framework that reduces the cost of laboriously handcrafting complex dialog managers and that provides robustness against the errors created by speech recognizers operating in noisy environments. By including an explicit Bayesian model of uncertainty and by optimizing the policy via a reward-driven process, partially observable Markov decision processes (POMDPs) provide such a framework. However, exact model representation and optimization is computationally intractable. Hence, the practical application of POMDP-based systems requires efficient algorithms and carefully constructed approximations. This review article provides an overview of the current state of the art in the development of POMDP-based spoken dialog systems.

empirical methods in natural language processing | 2015

Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems

Tsung-Hsien Wen; Milica Gasic; Nikola Mrksic; Pei-Hao Su; David Vandyke; Steve J. Young

© 2015 Association for Computational Linguistics. Natural language generation (NLG) is a critical component of spoken dialogue and it has a significant impact both on usability and perceived quality. Most NLG systems in common use employ rules and heuristics and tend to generate rigid and stylised responses without the natural variation of human language. They are also not easily scaled to systems covering multiple domains and languages. This paper presents a statistical language generator based on a semantically controlled Long Short-term Memory (LSTM) structure. The LSTM generator can learn from unaligned data by jointly optimising sentence planning and surface realisation using a simple cross entropy training criterion, and language variation can be easily achieved by sampling from output candidates. With fewer heuristics, an objective evaluation in two differing test domains showed the proposed method improved performance compared to previous methods. Human judges scored the LSTM system higher on informativeness and naturalness and overall preferred it to the other systems..

international conference on acoustics, speech, and signal processing | 2009

Spoken language understanding from unaligned data using discriminative classification models

François Mairesse; Milica Gasic; Filip Jurčíček; Simon Keizer; Blaise Thomson; Kai Yu; Steve J. Young

While data-driven methods for spoken language understanding reduce maintenance and portability costs compared with handcrafted parsers, the collection of word-level semantic annotations for training remains a time-consuming task. A recent line of research has focused on building generative models from unaligned semantic representations, using expectation-maximisation techniques to align semantic concepts. This paper presents an efficient, simple technique that parses a semantic tree by recursively calling discriminative semantic classification models. Results show that it outperforms methods based on the Hidden Vector State model and Markov Logic Networks, while performance is close to more complex grammar induction techniques. We also show that our method is robust to speech recognition errors, by improving over a handcrafted parser previously used for dialogue data collection.

spoken language technology workshop | 2012

Discriminative spoken language understanding using word confusion networks

Matthew Henderson; Milica Gasic; Blaise Thomson; Pirros Tsiakoulis; Kai Yu; Steve J. Young

Current commercial dialogue systems typically use hand-crafted grammars for Spoken Language Understanding (SLU) operating on the top one or two hypotheses output by the speech recogniser. These systems are expensive to develop and they suffer from significant degradation in performance when faced with recognition errors. This paper presents a robust method for SLU based on features extracted from the full posterior distribution of recognition hypotheses encoded in the form of word confusion networks. Following [1], the system uses SVM classifiers operating on n-gram features, trained on unaligned input/output pairs. Performance is evaluated on both an off-line corpus and on-line in a live user trial. It is shown that a statistical discriminative approach to SLU operating on the full posterior ASR output distribution can substantially improve performance both in terms of accuracy and overall dialogue reward. Furthermore, additional gains can be obtained by incorporating features from the previous system output.

ieee automatic speech recognition and understanding workshop | 2011

On-line policy optimisation of spoken dialogue systems via live interaction with human subjects

Milica Gasic; Filip Jurčíček; Blaise Thomson; Kai Yu; Steve J. Young

Statistical dialogue models have required a large number of dialogues to optimise the dialogue policy, relying on the use of a simulated user. This results in a mismatch between training and live conditions, and significant development costs for the simulator thereby mitigating many of the claimed benefits of such models. Recent work on Gaussian process reinforcement learning, has shown that learning can be substantially accelerated. This paper reports on an experiment to learn a policy for a real-world task directly from human interaction using rewards provided by users. It shows that a usable policy can be learnt in just a few hundred dialogues without needing a user simulator and, using a learning strategy that reduces the risk of taking bad actions. The paper also investigates adaptation behaviour when the system continues learning for several thousand dialogues and highlights the need for robustness to noisy rewards.

IEEE Transactions on Audio, Speech, and Language Processing | 2014

Gaussian Processes for POMDP-Based Dialogue Manager Optimization

Milica Gasic; Stephen Young

A partially observable Markov decision process (POMDP) has been proposed as a dialog model that enables automatic optimization of the dialog policy and provides robustness to speech understanding errors. Various approximations allow such a model to be used for building real-world dialog systems. However, they require a large number of dialogs to train the dialog policy and hence they typically rely on the availability of a user simulator. They also require significant designer effort to hand-craft the policy representation. We investigate the use of Gaussian processes (GPs) in policy modeling to overcome these problems. We show that GP policy optimization can be implemented for a real world POMDP dialog manager, and in particular: 1) we examine different formulations of a GP policy to minimize variability in the learning process; 2) we find that the use of GP increases the learning rate by an order of magnitude thereby allowing learning by direct interaction with human users; and 3) we demonstrate that designer effort can be substantially reduced by basing the policy directly on the full belief space thereby avoiding ad hoc feature space modeling. Overall, the GP approach represents an important step forward towards fully automatic dialog policy optimization in real world systems.

annual meeting of the special interest group on discourse and dialogue | 2015

Stochastic Language Generation in Dialogue using Recurrent Neural Networks with Convolutional Sentence Reranking

Tsung-Hsien Wen; Milica Gasic; Dongho Kim; Nikola Mrksic; Pei-Hao Su; David Vandyke; Steve J. Young

The natural language generation (NLG) component of a spoken dialogue system (SDS) usually needs a substantial amount of handcrafting or a well-labeled dataset to be trained on. These limitations add significantly to development costs and make cross-domain, multi-lingual dialogue systems intractable. Moreover, human languages are context-aware. The most natural response should be directly learned from data rather than depending on predefined syntaxes or rules. This paper presents a statistical language generator based on a joint recurrent and convolutional neural network structure which can be trained on dialogue act-utterance pairs without any semantic alignments or predefined grammar trees. Objective metrics suggest that this new model outperforms previous methods under the same experimental conditions. Results of an evaluation by human judges indicate that it produces not only high quality but linguistically varied utterances which are preferred compared to n-gram and rule-based systems.

international conference on acoustics, speech, and signal processing | 2013

On-line policy optimisation of Bayesian spoken dialogue systems via human interaction

Milica Gasic; Catherine Breslin; Matthew Henderson; Dongho Kim; Martin Szummer; Blaise Thomson; Pirros Tsiakoulis; Steve J. Young

A partially observable Markov decision process has been proposed as a dialogue model that enables robustness to speech recognition errors and automatic policy optimisation using reinforcement learning (RL). However, conventional RL algorithms require a very large number of dialogues, necessitating a user simulator. Recently, Gaussian processes have been shown to substantially speed up the optimisation, making it possible to learn directly from interaction with human users. However, early studies have been limited to very low dimensional spaces and the learning has exhibited convergence problems. Here we investigate learning from human interaction using the Bayesian Update of Dialogue State system. This dynamic Bayesian network based system has an optimisation space covering more than one hundred features, allowing a wide range of behaviours to be learned. Using an improved policy model and a more robust reward function, we show that stable learning can be achieved that significantly outperforms a simulator trained policy.

ACM Transactions on Speech and Language Processing | 2011

Effective handling of dialogue state in the hidden information state POMDP-based dialogue manager

Milica Gasic; Steve J. Young

Effective dialogue management is critically dependent on the information that is encoded in the dialogue state. In order to deploy reinforcement learning for policy optimization, dialogue must be modeled as a Markov Decision Process. This requires that the dialogue state must encode all relevent information obtained during the dialogue prior to that state. This can be achieved by combining the user goal, the dialogue history, and the last user action to form the dialogue state. In addition, to gain robustness to input errors, dialogue must be modeled as a Partially Observable Markov Decision Process (POMDP) and hence, a distribution over all possible states must be maintained at every dialogue turn. This poses a potential computational limitation since there can be a very large number of dialogue states. The Hidden Information State model provides a principled way of ensuring tractability in a POMDP-based dialogue model. The key feature of this model is the grouping of user goals into partitions that are dynamically built during the dialogue. In this article, we extend this model further to incorporate the notion of complements. This allows for a more complex user goal to be represented, and it enables an effective pruning technique to be implemented that preserves the overall system performance within a limited computational resource more effectively than existing approaches.

Explore More