Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Andreas Wendemuth is active.

Publication


Featured researches published by Andreas Wendemuth.


ieee automatic speech recognition and understanding workshop | 2009

Acoustic emotion recognition: A benchmark comparison of performances

Björn W. Schuller; Bogdan Vlasenko; Florian Eyben; Gerhard Rigoll; Andreas Wendemuth

In the light of the first challenge on emotion recognition from speech we provide the largest-to-date benchmark comparison under equal conditions on nine standard corpora in the field using the two pre-dominant paradigms: modeling on a frame-level by means of hidden Markov models and supra-segmental modeling by systematic feature brute-forcing. Investigated corpora are the ABC, AVIC, DES, EMO-DB, eNTERFACE, SAL, SmartKom, SUSAS, and VAM databases. To provide better comparability among sets, we additionally cluster each databases emotions into binary valence and arousal discrimination tasks. In the result large differences are found among corpora that mostly stem from naturalistic emotions and spontaneous speech vs. more prototypical events. Further, supra-segmental modeling proves significantly beneficial on average when several classes are addressed at a time.


IEEE Transactions on Affective Computing | 2010

Cross-Corpus Acoustic Emotion Recognition: Variances and Strategies

Björn W. Schuller; Bogdan Vlasenko; Florian Eyben; Martin Wöllmer; André Stuhlsatz; Andreas Wendemuth; Gerhard Rigoll

As the recognition of emotion from speech has matured to a degree where it becomes applicable in real-life settings, it is time for a realistic view on obtainable performances. Most studies tend to...


affective computing and intelligent interaction | 2007

Frame vs. Turn-Level: Emotion Recognition from Speech Considering Static and Dynamic Processing

Bogdan Vlasenko; Björn W. Schuller; Andreas Wendemuth; Gerhard Rigoll

Opposing the pre-dominant turn-wise statistics of acoustic Low-Level-Descriptors followed by static classification we re-investigate dynamic modeling directly on the frame-level in speech-based emotion recognition. This seems beneficial, as it is well known that important information on temporal sub-turn-layers exists. And, most promisingly, we integrate this frame-level information within a state-of-the-art large-feature-space emotion recognition engine. In order to investigate frame-level processing we employ a typical speaker-recognition set-up tailored for the use of emotion classification. That is a GMM for classification and MFCC plus speed and acceleration coefficients as features. We thereby also consider use of multiple states, respectively an HMM. In order to fuse this information with turn-based modeling, output scores are added to a super-vector combined with static acoustic features. Thereby a variety of Low-Level-Descriptors and functionals to cover prosodic, speech quality, and articulatory aspects are considered. Starting from 1.4k features we select optimal configurations including and excluding GMM information. The final decision task is realized by use of SVM. Extensive test-runs are carried out on two popular public databases, namely EMO-DB and SUSAS, to investigate acted and spontaneous data. As we face the current challenge of speaker-independent analysis we also discuss benefits arising from speaker normalization. The results obtained clearly emphasize the superior power of integrated diverse time-levels.


Künstliche Intelligenz | 2016

Companion-Technology for Cognitive Technical Systems

Susanne Biundo; Andreas Wendemuth

We introduce the Transregional Collaborative Research Centre “Companion-Technology for Cognitive Technical Systems”—a cross-disciplinary endeavor towards the development of an enabling technology for Companion-systems. These systems completely adjust their functionality and service to the individual user. They comply with his or her capabilities, preferences, requirements, and current needs and adapt to the individual’s emotional state and ambient conditions. Companion-like behavior of technical systems is achieved through the investigation and implementation of cognitive abilities and their well-orchestrated interplay.


ieee automatic speech recognition and understanding workshop | 2007

Comparing one and two-stage acoustic modeling in the recognition of emotion in speech

Björn W. Schuller; Bogdan Vlasenko; Ricardo Minguez; Gerhard Rigoll; Andreas Wendemuth

In the search for a standard unit for use in recognition of emotion in speech, a whole turn, that is the full section of speech by one person in a conversation, is common. Within applications such turns often seem favorable. Yet, high effectiveness of sub-turn entities is known. In this respect a two-stage approach is investigated to provide higher temporal resolution by chunking of speech-turns according to acoustic properties, and multi-instance learning for turn-mapping after individual chunk analysis. For chunking fast pre-segmentation into emotionally quasi-stationary segments by one-pass Viterbi beam search with token passing basing on MFCC is used. Chunk analysis is realized by brute-force large feature space construction with subsequent subset selection, SVM classification, and speaker normalization. Extensive tests reveal differences compared to one-stage processing. Alternatively, syllables are used for chunking.


international conference on multimedia and expo | 2008

Combining speech recognition and acoustic word emotion models for robust text-independent emotion recognition

Björn W. Schuller; Bogdan Vlasenko; Dejan Arsic; Gerhard Rigoll; Andreas Wendemuth

Recognition of emotion in speech usually uses acoustic models that ignore the spoken content. Likewise one general model per emotion is trained independent of the phonetic structure. Given sufficient data, this approach seemingly works well enough. Yet, this paper tries to answer the question whether acoustic emotion recognition strongly depends on phonetic content, and if models tailored for the spoken unit can lead to higher accuracies. We therefore investigate phoneme-, and word-models by use of a large prosodic, spectral, and voice quality feature space and Support Vector Machines (SVM). Experiments also take the necessity of ASR into account to select appropriate unit- models. Test-runs on the well-known EMO-DB database facing speaker-independence demonstrate superiority of word emotion models over todays common general models provided sufficient occurrences in the training corpus.


international conference on pattern recognition | 2006

Mixture of Support Vector Machines for HMM based Speech Recognition

Sven E. Krüger; Martin Schafföner; Marcel Katz; Edin Andelic; Andreas Wendemuth

Speech recognition is usually based on hidden Markov models (HMMs), which represent the temporal dynamics of speech very efficiently, and Gaussian mixture models, which do non-optimally the classification of speech into single speech units (phonemes). In this paper we use parallel mixtures of support vector machines (SVMs) for classification by integrating this method in a HMM-based speech recognition system. SVMs are very appealing due to their association with statistical learning theory and have already shown good results in pattern recognition and in continuous speech recognition. They suffer however from the effort for training which scales at least quadratic with respect to the number of training vectors. The SVM mixtures need only nearly linear training time making it easier to deal with the large amount of speech data. In our hybrid system we use the SVM mixtures as acoustic models in a HMM-based decoder. We train and test the hybrid system on the DARPA resource management (RM1) corpus, showing better performance than HMM-based decoder using Gaussian mixtures


Journal on Multimodal User Interfaces | 2014

Inter-rater reliability for emotion annotation in human–computer interaction: comparison and methodological improvements

Ingo Siegert; Ronald Böck; Andreas Wendemuth

To enable a naturalistic human–computer interaction the recognition of emotions and intentions experiences increased attention and several modalities are comprised to cover all human communication abilities. For this reason, naturalistic material is recorded, where the subjects are guided through an interaction with crucial points, but with the freedom to react individually. This material captures realistic user reactions but lacks of clear labels. So, a good transcription and annotation of the given material is essential. For that, the assignment of human annotators has become widely accepted. A good measurement for the reliability of labelled material is the inter-rater agreement. In this paper we investigate the achieved inter-rater agreement utilizing Krippendorff’s alpha for emotional annotated interaction corpora and present methods to improve the reliability, we show that the reliabilities obtained with different methods does not differ much, so a choice could rely on other aspects. Furthermore, a multimodal presentation of the items in their natural order increases the reliability.


Computer Speech & Language | 2014

Modeling phonetic pattern variability in favor of the creation of robust emotion classifiers for real-life applications

Bogdan Vlasenko; Dmytro Prylipko; Ronald Böck; Andreas Wendemuth

The role of automatic emotion recognition from speech is growing continuously because of the accepted importance of reacting to the emotional state of the user in human-computer interaction. Most state-of-the-art emotion recognition methods are based on turn- and frame-level analysis independent from phonetic transcription. Here, we are interested in a phoneme-based classification of the level of arousal in acted and spontaneous emotions. To start, we show that our previously published classification technique which showed high-level results in the Interspeech 2009 Emotion Challenge cannot provide sufficiently good classification in cross-corpora evaluation (a condition close to real-life applications). To prove the robustness of our emotion classification techniques we use cross-corpora evaluation for a simplified two-class problem; namely high and low arousal emotions. We use emotion classes on a phoneme-level for classification. We build our speaker-independent emotion classifier with HMMs, using GMMs-based production probabilities and MFCC features. This classifier performs equally well when using a complete phoneme set, as it does in the case of a reduced set of indicative vowels (7 out of 39 phonemes in the German SAM-PA list). Afterwards we compare emotion classification performance of the technique used in the Emotion Challenge with phoneme-based classification within the same experimental setup. With phoneme-level emotion classes we increase cross-corpora classification performance by about 3.15% absolute (4.69% relative) for models trained on acted emotions (EMO-DB dataset) and evaluated on spontaneous emotions (VAM dataset); within vice versa experimental conditions (trained on VAM, tested on EMO-DB) we obtain 15.43% absolute (23.20% relative) improvement. We show that using phoneme-level emotion classes can improve classification performance even with comparably low speech recognition performance obtained with scant a priori knowledge about the language, implemented as a zero-gram for word-level modeling and a bi-gram for phoneme-level modeling. Finally we compare our results with the state-of-the-art cross-corpora evaluations on the VAM database. For training our models, we use an almost 15 times smaller training set, consisting of 456 utterances (210 low and 246 high arousal emotions) instead of 6820 utterances (4685 high and 2135 low arousal emotions). We are yet able to increase cross-corpora classification performance by about 2.25% absolute (3.22% relative) from UA=69.7% obtained by Zhang et al. to UA=71.95%.


mediterranean electrotechnical conference | 2010

Determining optimal signal features and parameters for HMM-based emotion classification

Ronald Böck; David Hübner; Andreas Wendemuth

The recognition of emotions from speech is a challenging issue. Creating emotion recognisers needs well defined signal features, parameter sets, and a huge amount of data material. Indeed, it is influenced by several conditions. This paper focuses on a proposal of an optimal parameter set for an HMM-based recogniser. For this, we compared different signal features (MFCCs, LPCs, and PLPs) as well as several architectures of HMMs. Moreover, we evaluated our proposal on three databases (eNTERFACE, Emo-DB, and SmartKom). Different proposals for acted/naive emotion recognition are given as well as recommendations for efficient and valid validation methods.

Collaboration


Dive into the Andreas Wendemuth's collaboration.

Top Co-Authors

Avatar

Ingo Siegert

Otto-von-Guericke University Magdeburg

View shared research outputs
Top Co-Authors

Avatar

Ronald Böck

Otto-von-Guericke University Magdeburg

View shared research outputs
Top Co-Authors

Avatar

Bogdan Vlasenko

Otto-von-Guericke University Magdeburg

View shared research outputs
Top Co-Authors

Avatar

Marcel Katz

Otto-von-Guericke University Magdeburg

View shared research outputs
Top Co-Authors

Avatar

Martin Schafföner

Otto-von-Guericke University Magdeburg

View shared research outputs
Top Co-Authors

Avatar

Sven E. Krüger

Otto-von-Guericke University Magdeburg

View shared research outputs
Top Co-Authors

Avatar

Edin Andelic

Otto-von-Guericke University Magdeburg

View shared research outputs
Top Co-Authors

Avatar

Stefan Glüge

Otto-von-Guericke University Magdeburg

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Dmytro Prylipko

Otto-von-Guericke University Magdeburg

View shared research outputs
Researchain Logo
Decentralizing Knowledge