Catherine Lai | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Catherine Lai is active.

Explore More

Publication

Featured researches published by Catherine Lai.

international conference on computational linguistics | 2014

Word-Level Emotion Recognition Using High-Level Features

Johanna D. Moore; Leimin Tian; Catherine Lai

In this paper, we investigate the use of high-level features for recognizing human emotions at the word-level in natural conversations with virtual agents. Experiments were carried out on the 2012 Audio/Visual Emotion Challenge AVEC2012 database, where emotions are defined as vectors in the Arousal-Expectancy-Power-Valence emotional space. Our model using 6 novel disfluency features yields significant improvements compared to those using large number of low-level spectral and prosodic features, and the overall performance difference between it and the best model of the AVEC2012 Word-Level Sub-Challenge is not significant. Our visual model using the Active Shape Model visual features also yields significant improvements compared to models using the low-level Local Binary Patterns visual features. We built a bimodal model By combining our disfluency and visual feature sets and applying Correlation-based Feature-subset Selection. Considering overall performance on all emotion dimensions, our bimodal model outperforms the second best model of the challenge, and comes close to the best model. It also gives the best result when predicting Expectancy values.

affective computing and intelligent interaction | 2015

Emotion recognition in spontaneous and acted dialogues

Leimin Tian; Johanna D. Moore; Catherine Lai

In this work, we compare emotion recognition on two types of speech: spontaneous and acted dialogues. Experiments were conducted on the AVEC2012 database of spontaneous dialogues and the IEMOCAP database of acted dialogues. We studied the performance of two types of acoustic features for emotion recognition: knowledge-inspired disfluency and nonverbal vocalisation (DIS-NV) features, and statistical Low-Level Descriptor (LLD) based features. Both Support Vector Machines (SVM) and Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN) were built using each feature set on each emotional database. Our work aims to identify aspects of the data that constrain the effectiveness of models and features. Our results show that the performance of different types of features and models is influenced by the type of dialogue and the amount of training data. Because DIS-NVs are less frequent in acted dialogues than in spontaneous dialogues, the DIS-NV features perform better than the LLD features when recognizing emotions in spontaneous dialogues, but not in acted dialogues. The LSTM-RNN model gives better performance than the SVM model when there is enough training data, but the complex structure of a LSTM-RNN model may limit its performance when there is less training data available, and may also risk over-fitting. Additionally, we find that long distance contexts may be more useful when performing emotion recognition at the word level than at the utterance level.

spoken language technology workshop | 2016

Recognizing emotions in spoken dialogue with hierarchically fused acoustic and lexical features

Leimin Tian; Johanna D. Moore; Catherine Lai

Automatic emotion recognition is vital for building natural and engaging human-computer interaction systems. Combining information from multiple modalities typically improves emotion recognition performance. In previous work, features from different modalities have generally been fused at the same level with two types of fusion strategies: Feature-Level fusion, which concatenates feature sets before recognition; and Decision-Level fusion, which makes the final decision based on outputs of the unimodal models. However, different features may describe data at different time scales or have different levels of abstraction. Cognitive Science research also indicates that when perceiving emotions, humans use information from different modalities at different cognitive levels and time steps. Therefore, we propose a Hierarchical fusion strategy for multimodal emotion recognition, which incorporates global or more abstract features at higher levels of its knowledge-inspired structure. We build multimodal emotion recognition models combining state-of-the-art acoustic and lexical features to study the performance of the proposed Hierarchical fusion. Experiments on two emotion databases of spoken dialogue show that this fusion strategy consistently outperforms both Feature-Level and Decision-Level fusion. The multimodal emotion recognition models using the Hierarchical fusion strategy achieved state-of-the-art performance on recognizing emotions in both spontaneous and acted dialogue.

affective computing and intelligent interaction | 2015

Recognizing emotions in dialogues with acoustic and lexical features

Leimin Tian; Johanna D. Moore; Catherine Lai

Automatic emotion recognition has long been a focus of Affective Computing. We aim at improving the performance of state-of-the-art emotion recognition in dialogues using novel knowledge-inspired features and modality fusion strategies. We propose features based on disfluencies and nonverbal vocalisations (DIS-NVs), and show that they are highly predictive for recognizing emotions in spontaneous dialogues. We also propose the hierarchical fusion strategy as an alternative to current feature-level and decision-level fusion. This fusion strategy combines features from different modalities at different layers in a hierarchical structure. It is expected to overcome limitations of feature-level and decision-level fusion by including knowledge on modality differences, while preserving information of each modality.

ICMI '18 Proceedings of the 20th ACM International Conference on Multimodal Interaction | 2018

Group Interaction Frontiers in Technology

Gabriel Murray; Hayley Hung; Joann Keyton; Catherine Lai; Nale Lehmann-Willenbrock; Catharine Oertel

Analysis of group interaction and team dynamics is an important topic in a wide variety of fields, owing to the amount of time that individuals typically spend in small groups for both professional and personal purposes, and given how crucial group cohesion and productivity are to the success of businesses and other organizations. This fact is attested by the rapid growth of fields such as People Analytics and Human Resource Analytics, which in turn have grown out of many decades of research in social psychology, organizational behaviour, computing, and network science, amongst other fields. The goal of this workshop is to bring together researchers from diverse fields related to group interaction, team dynamics, people analytics, multi-modal speech and language processing, social psychology, and organizational behaviour.

Proceedings of the 1st ACM SIGCHI International Workshop on Investigating Social Interactions with Artificial Agents | 2017

Recognizing emotions in spoken dialogue with acoustic and lexical cues

Leimin Tian; Johanna D. Moore; Catherine Lai

Emotions play a vital role in human communications. Therefore, it is desirable for virtual agent dialogue systems to recognize and react to users emotions. However, current automatic emotion recognizers have limited performance compared to humans. Our work attempts to improve performance of recognizing emotions in spoken dialogue by identifying dialogue cues predictive of emotions, and by building multimodal recognition models with a knowledge-inspired hierarchy. We conduct experiments on both spontaneous and acted dialogue data to study the efficacy of the proposed approaches. Our results show that including prior knowledge on emotions in dialogue in either the feature representation or the model structure is beneficial for automatic emotion recognition.

conference of the international speech communication association | 2016

Automatic Paragraph Segmentation with Lexical and Prosodic Features

Catherine Lai; Mireia Farrús; Johanna D. Moore

Comunicacio presentada a la Interspeech 2016, celebrada per la International Speech Communication Association (ISCA) els dies 8 a 12 de septembre de 2016 a San Francisco (EUA).

Proceedings of the Australasian Language Technology Workshop 2004 | 2004