Yanchao Yu
Heriot-Watt University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yanchao Yu.
annual meeting of the special interest group on discourse and dialogue | 2016
Yanchao Yu; Arash Eshghi; Oliver Lemon
We present a multi-modal dialogue system for interactive learning of perceptually grounded word meanings from a human tutor. The system integrates an incremental, semantic parsing/generation framework - Dynamic Syntax and Type Theory with Records (DS-TTR) - with a set of visual classifiers that are learned throughout the interaction and which ground the meaning representations that it produces. We use this system in interaction with a simulated human tutor to study the effects of different dialogue policies and capabilities on the accuracy of learned meanings, learning rates, and efforts/costs to the tutor. We show that the overall performance of the learning agent is affected by (1) who takes initiative in the dialogues; (2) the ability to express/use their confidence level about visual attributes; and (3) the ability to process elliptical and incrementally constructed dialogue turns. Ultimately, we train an adaptive dialogue policy which optimises the trade-off between classifier accuracy and tutoring costs.
Computer Speech & Language | 2016
Nina Dethlefs; Helen Hastie; Heriberto Cuayáhuitl; Yanchao Yu; Verena Rieser; Oliver Lemon
HighlightsInformation density, related to entropy, is related to overlaps in spoken language.Humans prefer overlaps based on information density and suprasegmental features.This is confirmed in a speech-based rating study (p<0.0001).Our results are relevant for spoken dialogue systems, especially incremental ones. Incremental dialogue systems are often perceived as more responsive and natural because they are able to address phenomena of turn-taking and overlapping speech, such as backchannels or barge-ins. Previous work in this area has often identified distinctive prosodic features, or features relating to syntactic or semantic completeness, as marking appropriate places of turn-taking. In a separate strand of work, psycholinguistic studies have established a connection between information density and prominence in language-the less expected a linguistic unit is in a particular context, the more likely it is to be linguistically marked. This has been observed across linguistic levels, including the prosodic, which plays an important role in predicting overlapping speech.In this article, we explore the hypothesis that information density (ID) also plays a role in turn-taking. Specifically, we aim to show that humans are sensitive to the peaks and troughs of information density in speech, and that overlapping speech at ID troughs is perceived as more acceptable than overlaps at ID peaks. To test our hypothesis, we collect human ratings for three models of generating overlapping speech based on features of: (1) prosody and semantic or syntactic completeness, (2) information density, and (3) both types of information. Results show that over 50% of users preferred the version using both types of features, followed by a preference for information density features alone. This indicates a clear human sensitivity to the effects of information density in spoken language and provides a strong motivation to adopt this metric for the design, development and evaluation of turn-taking modules in spoken and incremental dialogue systems.
annual meeting of the special interest group on discourse and dialogue | 2014
Helen Hastie; Marie-Aude Aufaure; Panos Alexopoulos; Hugues Bouchard; Catherine Breslin; Heriberto Cuayáhuitl; Nina Dethlefs; Milica Gasic; James Henderson; Oliver Lemon; Xingkun Liu; Peter Mika; Nesrine Ben Mustapha; Tim Potter; Verena Rieser; Blaise Thomson; Pirros Tsiakoulis; Yves Vanrompay; Boris Villazon-Terrazas; Majid Yazdani; Steve J. Young; Yanchao Yu
We demonstrate a mobile application in English and Mandarin to test and evaluate components of the Parlance dialogue system for interactive search under real-world conditions.
meeting of the association for computational linguistics | 2016
Yanchao Yu; Arash Eshghi; Oliver Lemon
We present a multi-modal dialogue system for interactive learning of perceptually grounded word meanings from a human tutor. The system integrates an incremental, semantic parsing/generation framework Dynamic Syntax and Type Theory with Records (DS-TTR) with a set of visual classifiers that are learned throughout the interaction and which ground the meaning representations that it produces. We use this system in interaction with a simulated human tutor to study the effect of different dialogue policies and capabilities on accuracy of learned meanings, learning rates, and efforts/costs to the tutor. We show that the overall performance of the learning agent is affected by (1) who takes initiative in the dialogues; (2) the ability to express/use their confidence level about visual attributes; and (3) the ability to process elliptical as well as incrementally constructed dialogue turns.
meeting of the association for computational linguistics | 2017
Yanchao Yu; Arash Eshghi; Oliver Lemon
We present an optimised multi-modal dialogue agent for interactive learning of visually grounded word meanings from a human tutor, trained on real human-human tutoring data. Within a life-long interactive learning period, the agent, trained using Reinforcement Learning (RL), must be able to handle natural conversations with human users and achieve good learning performance (accuracy) while minimising human effort in the learning process. We train and evaluate this system in interaction with a simulated human tutor, which is built on the BURCHAK corpus -- a Human-Human Dialogue dataset for the visual learning task. The results show that: 1) The learned policy can coherently interact with the simulated user to achieve the goal of the task (i.e. learning visual attributes of objects, e.g. colour and shape); and 2) it finds a better trade-off between classifier accuracy and tutoring costs than hand-crafted rule-based policies, including ones with dynamic policies.
international conference on natural language generation | 2016
Yanchao Yu; Arash Eshghi; Oliver Lemon
We present a multi-modal dialogue system for interactive learning of perceptually grounded word meanings from a human tutor (Yu et al., ). The system integrates an incremental, semantic, and bidirectional grammar framework – Dynamic Syntax and Type Theory with Records (DS-TTR1, (Eshghi et al., 2012; Kempson et al., 2001)) – with a set of visual classifiers that are learned throughout the interaction and which ground the semantic/contextual representations that it produces (c.f. Kennington & Schlangen (2015) where words, rather than semantic atoms, are grounded in visual classifiers). Our approach extends Dobnik et al. (2012) in integrating perception (vision in this case) and language within a single formal system: Type Theory with Records (TTR (Cooper, 2005)). The combination of deep semantic representations in TTR with an incremental grammar (Dynamic Syntax) allows for complex multi-turn dialogues to be parsed and generated (Eshghi et al., 2015). These include clarification interaction, corrections, ellipsis and utterance continuations (see e.g. the dialogue in Fig. 1).
empirical methods in natural language processing | 2015
Yanchao Yu; Arash Eshghi; Oliver Lemon
We address the problem of interactively learning perceptually grounded word meanings in a multimodal dialogue system. We design a semantic and visual processing system to support this and illustrate how they can be integrated. We then focus on comparing the performance (Precision, Recall, F1, AUC) of three state-of-the-art attribute classifiers for the purpose of interactive language grounding (MLKNN, DAP, and SVMs), on the aPascal-aYahoo datasets. In prior work, results were presented for object classification using these methods for attribute labelling, whereas we focus on their performance for attribute labelling itself. We find that while these methods can perform well for some of the attributes (e.g. head, ears, furry) none of these models has good performance over the whole attribute set, and none supports incremental learning. This leads us to suggest directions for future work.
20th Workshop Series on the Semantics and Pragmatics of Dialogue 2016 | 2016
Yanchao Yu; Oliver Lemon; Arash Eshghi
Proceedings of the Sixth Workshop on Vision and Language | 2017
Yanchao Yu; Arash Eshghi; Gregory Mills; Oliver Lemon
19th Workshop on the Semantics and Pragmatics of Dialogue | 2015
Yanchao Yu; Oliver Lemon; Arash Eshghi