Leimin Tian
University of Edinburgh
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Leimin Tian.
international conference on computational linguistics | 2014
Johanna D. Moore; Leimin Tian; Catherine Lai
In this paper, we investigate the use of high-level features for recognizing human emotions at the word-level in natural conversations with virtual agents. Experiments were carried out on the 2012 Audio/Visual Emotion Challenge AVEC2012 database, where emotions are defined as vectors in the Arousal-Expectancy-Power-Valence emotional space. Our model using 6 novel disfluency features yields significant improvements compared to those using large number of low-level spectral and prosodic features, and the overall performance difference between it and the best model of the AVEC2012 Word-Level Sub-Challenge is not significant. Our visual model using the Active Shape Model visual features also yields significant improvements compared to models using the low-level Local Binary Patterns visual features. We built a bimodal model By combining our disfluency and visual feature sets and applying Correlation-based Feature-subset Selection. Considering overall performance on all emotion dimensions, our bimodal model outperforms the second best model of the challenge, and comes close to the best model. It also gives the best result when predicting Expectancy values.
affective computing and intelligent interaction | 2015
Leimin Tian; Johanna D. Moore; Catherine Lai
In this work, we compare emotion recognition on two types of speech: spontaneous and acted dialogues. Experiments were conducted on the AVEC2012 database of spontaneous dialogues and the IEMOCAP database of acted dialogues. We studied the performance of two types of acoustic features for emotion recognition: knowledge-inspired disfluency and nonverbal vocalisation (DIS-NV) features, and statistical Low-Level Descriptor (LLD) based features. Both Support Vector Machines (SVM) and Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN) were built using each feature set on each emotional database. Our work aims to identify aspects of the data that constrain the effectiveness of models and features. Our results show that the performance of different types of features and models is influenced by the type of dialogue and the amount of training data. Because DIS-NVs are less frequent in acted dialogues than in spontaneous dialogues, the DIS-NV features perform better than the LLD features when recognizing emotions in spontaneous dialogues, but not in acted dialogues. The LSTM-RNN model gives better performance than the SVM model when there is enough training data, but the complex structure of a LSTM-RNN model may limit its performance when there is less training data available, and may also risk over-fitting. Additionally, we find that long distance contexts may be more useful when performing emotion recognition at the word level than at the utterance level.
spoken language technology workshop | 2016
Leimin Tian; Johanna D. Moore; Catherine Lai
Automatic emotion recognition is vital for building natural and engaging human-computer interaction systems. Combining information from multiple modalities typically improves emotion recognition performance. In previous work, features from different modalities have generally been fused at the same level with two types of fusion strategies: Feature-Level fusion, which concatenates feature sets before recognition; and Decision-Level fusion, which makes the final decision based on outputs of the unimodal models. However, different features may describe data at different time scales or have different levels of abstraction. Cognitive Science research also indicates that when perceiving emotions, humans use information from different modalities at different cognitive levels and time steps. Therefore, we propose a Hierarchical fusion strategy for multimodal emotion recognition, which incorporates global or more abstract features at higher levels of its knowledge-inspired structure. We build multimodal emotion recognition models combining state-of-the-art acoustic and lexical features to study the performance of the proposed Hierarchical fusion. Experiments on two emotion databases of spoken dialogue show that this fusion strategy consistently outperforms both Feature-Level and Decision-Level fusion. The multimodal emotion recognition models using the Hierarchical fusion strategy achieved state-of-the-art performance on recognizing emotions in both spontaneous and acted dialogue.
affective computing and intelligent interaction | 2015
Leimin Tian; Johanna D. Moore; Catherine Lai
Automatic emotion recognition has long been a focus of Affective Computing. We aim at improving the performance of state-of-the-art emotion recognition in dialogues using novel knowledge-inspired features and modality fusion strategies. We propose features based on disfluencies and nonverbal vocalisations (DIS-NVs), and show that they are highly predictive for recognizing emotions in spontaneous dialogues. We also propose the hierarchical fusion strategy as an alternative to current feature-level and decision-level fusion. This fusion strategy combines features from different modalities at different layers in a hierarchical structure. It is expected to overcome limitations of feature-level and decision-level fusion by including knowledge on modality differences, while preserving information of each modality.
Proceedings of the 1st ACM SIGCHI International Workshop on Investigating Social Interactions with Artificial Agents | 2017
Leimin Tian; Johanna D. Moore; Catherine Lai
Emotions play a vital role in human communications. Therefore, it is desirable for virtual agent dialogue systems to recognize and react to users emotions. However, current automatic emotion recognizers have limited performance compared to humans. Our work attempts to improve performance of recognizing emotions in spoken dialogue by identifying dialogue cues predictive of emotions, and by building multimodal recognition models with a knowledge-inspired hierarchy. We conduct experiments on both spontaneous and acted dialogue data to study the efficacy of the proposed approaches. Our results show that including prior knowledge on emotions in dialogue in either the feature representation or the model structure is beneficial for automatic emotion recognition.
Archive | 2015
Leimin Tian; Catherine Lai; Johanna D. Moore
affective computing and intelligent interaction | 2017
Leimin Tian; Michal Muszynski; Catherine Lai; Johanna D. Moore; Theodoras Kostoulas; Patrizia Lombardo; Thierry Pun; Guillaume Chanel
arXiv: Computation and Language | 2018
Leimin Tian; Catherine Lai; Johanna D. Moore
Archive | 2017
Leimin Tian; Johanna D. Moore; Catherine Lai
The 4th Interdisciplinary Workshop on Laughter and other Non-Verbal Vocalisations in Speech | 2015
Leimin Tian; Catherine Lai; Johanna D. Moore