Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ronald Böck is active.

Publication


Featured researches published by Ronald Böck.


international conference on human computer interaction | 2011

Multimodal emotion classification in naturalistic user behavior

Steffen Walter; Stefan Scherer; Martin Schels; Michael Glodek; David Hrabal; Miriam Schmidt; Ronald Böck; Kerstin Limbrecht; Harald C. Traue; Friedhelm Schwenker

The design of intelligent personalized interactive systems, having knowledge about the users state, his desires, needs and wishes, currently poses a great challenge to computer scientists. In this study we propose an information fusion approach combining acoustic, and biophysiological data, comprising multiple sensors, to classify emotional states. For this purpose a multimodal corpus has been created, where subjects undergo a controlled emotion eliciting experiment, passing several octants of the valence arousal dominance space. The temporal and decision level fusion of the multiple modalities outperforms the single modality classifiers and shows promising results.


Journal on Multimodal User Interfaces | 2014

Inter-rater reliability for emotion annotation in human–computer interaction: comparison and methodological improvements

Ingo Siegert; Ronald Böck; Andreas Wendemuth

To enable a naturalistic human–computer interaction the recognition of emotions and intentions experiences increased attention and several modalities are comprised to cover all human communication abilities. For this reason, naturalistic material is recorded, where the subjects are guided through an interaction with crucial points, but with the freedom to react individually. This material captures realistic user reactions but lacks of clear labels. So, a good transcription and annotation of the given material is essential. For that, the assignment of human annotators has become widely accepted. A good measurement for the reliability of labelled material is the inter-rater agreement. In this paper we investigate the achieved inter-rater agreement utilizing Krippendorff’s alpha for emotional annotated interaction corpora and present methods to improve the reliability, we show that the reliabilities obtained with different methods does not differ much, so a choice could rely on other aspects. Furthermore, a multimodal presentation of the items in their natural order increases the reliability.


Computer Speech & Language | 2014

Modeling phonetic pattern variability in favor of the creation of robust emotion classifiers for real-life applications

Bogdan Vlasenko; Dmytro Prylipko; Ronald Böck; Andreas Wendemuth

The role of automatic emotion recognition from speech is growing continuously because of the accepted importance of reacting to the emotional state of the user in human-computer interaction. Most state-of-the-art emotion recognition methods are based on turn- and frame-level analysis independent from phonetic transcription. Here, we are interested in a phoneme-based classification of the level of arousal in acted and spontaneous emotions. To start, we show that our previously published classification technique which showed high-level results in the Interspeech 2009 Emotion Challenge cannot provide sufficiently good classification in cross-corpora evaluation (a condition close to real-life applications). To prove the robustness of our emotion classification techniques we use cross-corpora evaluation for a simplified two-class problem; namely high and low arousal emotions. We use emotion classes on a phoneme-level for classification. We build our speaker-independent emotion classifier with HMMs, using GMMs-based production probabilities and MFCC features. This classifier performs equally well when using a complete phoneme set, as it does in the case of a reduced set of indicative vowels (7 out of 39 phonemes in the German SAM-PA list). Afterwards we compare emotion classification performance of the technique used in the Emotion Challenge with phoneme-based classification within the same experimental setup. With phoneme-level emotion classes we increase cross-corpora classification performance by about 3.15% absolute (4.69% relative) for models trained on acted emotions (EMO-DB dataset) and evaluated on spontaneous emotions (VAM dataset); within vice versa experimental conditions (trained on VAM, tested on EMO-DB) we obtain 15.43% absolute (23.20% relative) improvement. We show that using phoneme-level emotion classes can improve classification performance even with comparably low speech recognition performance obtained with scant a priori knowledge about the language, implemented as a zero-gram for word-level modeling and a bi-gram for phoneme-level modeling. Finally we compare our results with the state-of-the-art cross-corpora evaluations on the VAM database. For training our models, we use an almost 15 times smaller training set, consisting of 456 utterances (210 low and 246 high arousal emotions) instead of 6820 utterances (4685 high and 2135 low arousal emotions). We are yet able to increase cross-corpora classification performance by about 2.25% absolute (3.22% relative) from UA=69.7% obtained by Zhang et al. to UA=71.95%.


mediterranean electrotechnical conference | 2010

Determining optimal signal features and parameters for HMM-based emotion classification

Ronald Böck; David Hübner; Andreas Wendemuth

The recognition of emotions from speech is a challenging issue. Creating emotion recognisers needs well defined signal features, parameter sets, and a huge amount of data material. Indeed, it is influenced by several conditions. This paper focuses on a proposal of an optimal parameter set for an HMM-based recogniser. For this, we compared different signal features (MFCCs, LPCs, and PLPs) as well as several architectures of HMMs. Moreover, we evaluated our proposal on three databases (eNTERFACE, Emo-DB, and SmartKom). Different proposals for acted/naive emotion recognition are given as well as recommendations for efficient and valid validation methods.


privacy security risk and trust | 2012

How Do We React to Context? Annotation of Individual and Group Engagement in a Video Corpus

Francesca Bonin; Ronald Böck; Nick Campbell

One of the main challenges of recent years is to create socially intelligent machines: machines able not only to communicate but also to understand social signals and make sense of the various social contexts. In this paper we describe a new annotation method for the analysis of involvement, aiming at exploring the relation between the individual and the social context in terms of perceived involvement, starting from the idea that the group is more than the sum of its parts. We present a description of the annotation method and preliminary results of an analysis of a multi-party casual conversation extracted from a multimedia multimodal corpus. The work aims to explore the mechanisms by which we react to social context so that we can develop automatic dialogue systems that are able to adapt to their environment in a similar way.


international conference on multimedia and expo | 2011

Vowels formants analysis allows straightforward detection of high arousal emotions

Bogdan Vlasenko; David Philippou-Hübner; Dmytro Prylipko; Ronald Böck; Ingo Siegert; Andreas Wendemuth

Recently, automatic emotion recognition from speech has achieved growing interest within the human-machine interaction research community. Most part of emotion recognition methods use context independent frame-level analysis or turn-level analysis. In this article, we introduce context dependent vowel level analysis applied for emotion classification. An average first formant value extracted on vowel level has been used as unidimensional acoustic feature vector. The Neyman-Pearson criterion has been used for classification purpose. Our classifier is able to detect high-arousal emotions with small error rates. Within our research we proved that the smallest emotional unit should be the vowel instead of the word. We find out that using vowel level analysis can be an important issue during developing a robust emotion classifier. Also, our research can be useful for developing robust affective speech recognition methods and high quality emotional speech synthesis systems.


international conference on multimedia and expo | 2011

Appropriate emotional labelling of non-acted speech using basic emotions, geneva emotion wheel and self assessment manikins

Ingo Siegert; Ronald Böck; Bogdan Vlasenko; David Philippou-Hübner; Andreas Wendemuth

In emotion recognition from speech, a good transcription and annotation of given material is crucial. Moreover, the question of how to find good emotional labels for new data material is a basic issue. It is not only the question of which emotion labels to choose, it is also a matter of how labellers can cope with annotation methods. In this paper, we present our investigations for emotional labelling with three different methods (Basic Emotions, Geneva Emotion Wheel and Self Assessment Manikins) and compare them in terms of emotion coverage and usability. We show that emotion labels derived from Geneva Emotion Wheel or Self Assessment Manikins fulfill our requirements, but Basic Emotions are not feasible for emotion labelling from spontaneous speech.


Cognitive Computation | 2014

Investigation of Speaker Group-Dependent Modelling for Recognition of Affective States from Speech

Ingo Siegert; David Philippou-Hübner; Kim Hartmann; Ronald Böck; Andreas Wendemuth

For successful human–machine-interaction (HCI) the pure textual information and the individual skills, preferences, and affective states of the user must be known. Therefore, as a starting point, the user’s actual affective state has to be recognized. In this work we investigated how additional knowledge, for example age and gender of the user, can be used to improve recognition of affective state. Two methods from automatic speech recognition are used to incorporate age and gender differences in recognition of affective state: speaker group-dependent (SGD) modelling and vocal tract length normalisation (VTLN). The investigations were performed on four corpora with acted and natural affected speech. Different features and two methods of classification (Gaussian mixture models (GMMs) and multi-layer perceptrons (MLPs)) were used. In addition, the effects of channel compensation and contextual characteristics were analysed. The results are compared with our own baseline results and with results reported in the literature. Two hypotheses were tested. First, incorporation of age information further improves speaker group-dependent modelling. Second, acoustic normalization does not achieve the same improvement as achieved by speaker group-dependent modelling, because the age and gender of a speaker affects the way emotions are expressed.


italian workshop on neural nets | 2014

Investigating the Form-Function-Relation of the Discourse Particle “hm” in a Naturalistic Human-Computer Interaction

Ingo Siegert; Dmytro Prylipko; Kim Hartmann; Ronald Böck; Andreas Wendemuth

For a successful speech-controlled human-computer interaction (HCI) the pure textual information as well as individual skills, preferences, and affective states of the user have to be known. However, verbal human interaction consists of several information layers. Apart from pure textual information, further details regarding the speaker’s feelings, believes, and social relations are transmitted. The additional information is encoded through acoustics. Especially, the intonation reveals details about the speakers communicative relation and their attitude towards the ongoing dialogue.


affective computing and intelligent interaction | 2013

Annotation and Classification of Changes of Involvement in Group Conversation

Ronald Böck; Stefan Glüge; Ingo Siegert; Andreas Wendemuth

The detection of involvement in a conversation is important to assess the level humans are participating in either a human-human or human-computer interaction. Especially, detecting changes in a groups involvement in a multi-party interaction is of interest to distinguish several constellations in the group itself. This information can further be used in situations where technical support of meetings is favoured, for instance, focusing a camera, switching microphones, etc. Moreover, this information could also help to improve the performance of technical systems applied in human-machine interaction. In this paper, we concentrate on video material given by the Table Talk corpus. Therefore, we introduce a way of annotating and classifying changes of involvement and discuss the reliability of the annotation. Further, we present classification results based on video features using Multi-Layer Networks.

Collaboration


Dive into the Ronald Böck's collaboration.

Top Co-Authors

Avatar

Andreas Wendemuth

Otto-von-Guericke University Magdeburg

View shared research outputs
Top Co-Authors

Avatar

Ingo Siegert

Otto-von-Guericke University Magdeburg

View shared research outputs
Top Co-Authors

Avatar

Stefan Glüge

Otto-von-Guericke University Magdeburg

View shared research outputs
Top Co-Authors

Avatar

Bogdan Vlasenko

Otto-von-Guericke University Magdeburg

View shared research outputs
Top Co-Authors

Avatar

Kim Hartmann

Otto-von-Guericke University Magdeburg

View shared research outputs
Top Co-Authors

Avatar

David Philippou-Hübner

Otto-von-Guericke University Magdeburg

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Dmytro Prylipko

Otto-von-Guericke University Magdeburg

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge