Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Angeliki Metallinou is active.

Publication


Featured researches published by Angeliki Metallinou.


IEEE Transactions on Affective Computing | 2012

Context-Sensitive Learning for Enhanced Audiovisual Emotion Classification

Angeliki Metallinou; Martin Wöllmer; Athanasios Katsamanis; Florian Eyben; Björn W. Schuller; Shrikanth Narayanan

Human emotional expression tends to evolve in a structured manner in the sense that certain emotional evolution patterns, i.e., anger to anger, are more probable than others, e.g., anger to happiness. Furthermore, the perception of an emotional display can be affected by recent emotional displays. Therefore, the emotional content of past and future observations could offer relevant temporal context when classifying the emotional content of an observation. In this work, we focus on audio-visual recognition of the emotional content of improvised emotional interactions at the utterance level. We examine context-sensitive schemes for emotion recognition within a multimodal, hierarchical approach: bidirectional Long Short-Term Memory (BLSTM) neural networks, hierarchical Hidden Markov Model classifiers (HMMs), and hybrid HMM/BLSTM classifiers are considered for modeling emotion evolution within an utterance and between utterances over the course of a dialog. Overall, our experimental results indicate that incorporating long-term temporal context is beneficial for emotion recognition systems that encounter a variety of emotional manifestations. Context-sensitive approaches outperform those without context for classification tasks such as discrimination between valence levels or between clusters in the valence-activation space. The analysis of emotional transitions in our database sheds light into the flow of affective expressions, revealing potentially useful patterns.


affective computing and intelligent interaction | 2009

Interpreting ambiguous emotional expressions

Emily Mower; Angeliki Metallinou; Chi-Chun Lee; Abe Kazemzadeh; Carlos Busso; Sungbok Lee; Shrikanth Narayanan

Emotion expression is a complex process involving dependencies based on time, speaker, context, mood, personality, and culture. Emotion classification algorithms designed for real-world application must be able to interpret the emotional content of an utterance or dialog given the modulations resulting from these and other dependencies. Algorithmic development often rests on the assumption that the input emotions are uniformly recognized by a pool of evaluators. However, this style of consistent prototypical emotion expression often does not exist outside of a laboratory environment. This paper presents methods for interpreting the emotional content of non-prototypical utterances. These methods include modeling across multiple time-scales and modeling interaction dynamics between interlocutors. This paper recommends classifying emotions based on emotional profiles, or soft-labels, of emotion expression rather than relying on just raw acoustic features or categorical hard labels. Emotion expression is both interactive and dynamic. Consequently, to accurately recognize emotional content, these aspects must be incorporated during algorithmic design to improve classification performance.


ieee international conference on automatic face gesture recognition | 2013

Annotation and processing of continuous emotional attributes: Challenges and opportunities

Angeliki Metallinou; Shrikanth Narayanan

Human emotional and cognitive states evolve with variable intensity and clarity through the course of social interactions and experiences, and they are continuously influenced by a variety of input multimodal information from the environment and the interaction participants. This has motivated the development of a new area within affective computing that treats emotions as continuous variables and examines their representation, annotation and modeling. In this work, we use as a starting point the continuous emotional annotation that we performed on a large, multimodal database, and discuss annotation challenges, design decisions, annotation results and lessons learned from this effort, in the context of existing literature. Additionally, we discuss a variety of open questions for future research in terms of labeling, combining and processing continuous assessments of emotional and cognitive states.


international conference on acoustics, speech, and signal processing | 2010

Decision level combination of multiple modalities for recognition and analysis of emotional expression

Angeliki Metallinou; Sungbok Lee; Shrikanth Narayanan

Emotion is expressed and perceived through multiple modalities. In this work, we model face, voice and head movement cues for emotion recognition and we fuse classifiers using a Bayesian framework. The facial classifier is the best performing followed by the voice and head classifiers and the multiple modalities seem to carry complementary information, especially for happiness. Decision fusion significantly increases the average total unweighted accuracy, from 55% to about 62%. Overall, we achieve average accuracy on the order of 65–75% for emotional states and 30–40% for neutral state using a large multi-speaker, multimodal database. Performance analysis for the case of anger and neutrality suggests a positive correlation between the number of classifiers that performed well and the perceptual salience of the expressed emotion.


international symposium on multimedia | 2008

Audio-Visual Emotion Recognition Using Gaussian Mixture Models for Face and Voice

Angeliki Metallinou; Sungbok Lee; Shrikanth Narayanan

Emotion expression associated with human communication is known to be a multimodal process. In this work, we investigate the way that emotional information is conveyed by facial and vocal modalities, and how these modalities can be effectively combined to achieve improved emotion recognition accuracy. In particular, the behaviors of different facial regions are studied in detail. We analyze an emotion database recorded from ten speakers (five female, five male), which contains speech and facial marker data. Each individual modality is modeled by Gaussian mixture models (GMMs). Multiple modalities are combined using two different methods: a Bayesian classifier weighting scheme and support vector machines that use post classification accuracies as features. Individual modality recognition performances indicate that anger and sadness have comparable accuracies for facial and vocal modalities, while happiness seems to be more accurately transmitted by facial expressions than voice. The neutral state has the lowest performance, possibly due to the vague definition of neutrality. Cheek regions achieve better emotion recognition accuracy compared to other facial regions. Moreover, classifier combination leads to significantly higher performance, which confirms that training detailed single modality classifiers and combining them at a later stage is an effective approach.


international conference on acoustics, speech, and signal processing | 2010

Visual emotion recognition using compact facial representations and viseme information

Angeliki Metallinou; Carlos Busso; Sungbok Lee; Shrikanth Narayanan

Emotion expression is an essential part of human interaction. Rich emotional information is conveyed through the human face. In this study, we analyze detailed motion-captured facial information of ten speakers of both genders during emotional speech. We derive compact facial representations using methods motivated by Principal Component Analysis and speaker face normalization. Moreover, we model emotional facial movements by conditioning on knowledge of speech-related movements (articulation). We achieve average classification accuracies on the order of 75% for happiness, 50–60% for anger and sadness and 35% for neutrality in speaker independent experiments. We also find that dynamic modeling and the use of viseme information improves recognition accuracy for anger, happiness and sadness, as well as for the overall unweighted performance.


international conference on acoustics, speech, and signal processing | 2011

Iterative feature normalization for emotional speech detection

Carlos Busso; Angeliki Metallinou; Shrikanth Narayanan

Contending with signal variability due to source and channel effects is a critical problem in automatic emotion recognition. Any approach in mitigating these effects however has to be done so as to not compromise emotion-relevant information in the signal. A promising approach to this problem has been through feature normalization using features drawn from non-emotional (“neutral”) speech samples. This paper considers a scheme for minimizing the inter-speaker differences while still preserving the emotional discrimination of the acoustic features. This can be achieved by estimating the normalization parameters using only neutral speech, and then applying the coefficients to the entire corpus (including emotional set). Specifically, this paper introduces a feature normalization scheme that implements these ideas by iteratively detecting neutral speech and normalizing the features. As the approximation error of the normalization parameters is reduced, the accuracy of the emotion detection system increases. The accuracy of the proposed iterative approach, evaluated across three databases, is only 2.5% lower than the one trained with optimal normalization parameters, and 9.7% higher than the one trained without any normalization scheme.


international conference on acoustics, speech, and signal processing | 2011

Tracking changes in continuous emotion states using body language and prosodic cues

Angeliki Metallinou; Athanassios Katsamanis; Yun Wang; Shrikanth Narayanan

Human expressive interactions are characterized by an ongoing unfolding of verbal and nonverbal cues. Such cues convey the interlocutors emotional state which is continuous and of variable intensity and clarity over time. In this paper, we examine the emotional content of body language cues describing a participants posture, relative position and approach/withdraw behaviors during improvised affective interactions, and show that they reflect changes in the participants activation and dominance levels. Furthermore, we describe a framework for tracking changes in emotional states during an interaction using a statistical mapping between the observed audiovisual cues and the underlying user state. Our approach shows promising results for tracking changes in activation and dominance.


international conference on acoustics, speech, and signal processing | 2012

A hierarchical framework for modeling multimodality and emotional evolution in affective dialogs

Angeliki Metallinou; Athanasios Katsamanis; Shrikanth Narayanan

Incorporating multimodal information and temporal context from speakers during an emotional dialog can contribute to improving performance of automatic emotion recognition systems. Motivated by these issues, we propose a hierarchical framework which models emotional evolution within and between emotional utterances, i.e., at the utterance and dialog level respectively. Our approach can incorporate a variety of generative or discriminative classifiers at each level and provides flexibility and extensibility in terms of multimodal fusion; facial, vocal, head and hand movement cues can be included and fused according to the modality and the emotion classification task. Our results using the multimodal, multi-speaker IEMOCAP database indicate that this framework is well-suited for cases where emotions are expressed multimodally and in context, as in many real-life situations.


international conference on acoustics, speech, and signal processing | 2012

Speaker states recognition using latent factor analysis based Eigenchannel factor vector modeling

Ming Li; Angeliki Metallinou; Daniel Bone; Shrikanth Narayanan

This paper presents an automatic speaker state recognition approach which models the factor vectors in the latent factor analysis framework improving upon the Gaussian Mixture Model (GMM) baseline performance. We investigate both intoxicated and affective speaker states. We consider the affective speech signal as the original normal average speech signal being corrupted by the affective channel effects. Rather than reducing the channel variability to enhance the robustness as in the speaker verification task, we directly model the speaker state on the channel factors under the factor analysis framework. In this work, the speaker state factor vectors are extracted and modeled by the latent factor analysis approach in the GMM modeling framework and support vector machine classification method. Experimental results show that the proposed speaker state factor vector modeling system achieved 5.34% and 1.49% unweighted accuracy improvement over the GMM baseline on the intoxicated speech detection task (Alcohol Language Corpus) and the emotion recognition task (IEMOCAP database), respectively.

Collaboration


Dive into the Angeliki Metallinou's collaboration.

Top Co-Authors

Avatar

Shrikanth Narayanan

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Carlos Busso

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Sungbok Lee

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Anirudh Raju

University of California

View shared research outputs
Top Co-Authors

Avatar

Zhaojun Yang

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Athanasios Katsamanis

National Technical University of Athens

View shared research outputs
Top Co-Authors

Avatar

Chi-Chun Lee

National Tsing Hua University

View shared research outputs
Top Co-Authors

Avatar

Ming Li

University of Southern California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ashwin Ram

Georgia Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge