Is this you? Create Your Porfile

Ingo Siegert

Otto-von-Guericke University Magdeburg

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ingo Siegert is active.

Explore More

Publication

Featured researches published by Ingo Siegert.

Journal on Multimodal User Interfaces | 2014

Inter-rater reliability for emotion annotation in human–computer interaction: comparison and methodological improvements

Ingo Siegert; Ronald Böck; Andreas Wendemuth

To enable a naturalistic human–computer interaction the recognition of emotions and intentions experiences increased attention and several modalities are comprised to cover all human communication abilities. For this reason, naturalistic material is recorded, where the subjects are guided through an interaction with crucial points, but with the freedom to react individually. This material captures realistic user reactions but lacks of clear labels. So, a good transcription and annotation of the given material is essential. For that, the assignment of human annotators has become widely accepted. A good measurement for the reliability of labelled material is the inter-rater agreement. In this paper we investigate the achieved inter-rater agreement utilizing Krippendorff’s alpha for emotional annotated interaction corpora and present methods to improve the reliability, we show that the reliabilities obtained with different methods does not differ much, so a choice could rely on other aspects. Furthermore, a multimodal presentation of the items in their natural order increases the reliability.

international conference on multimedia and expo | 2011

Vowels formants analysis allows straightforward detection of high arousal emotions

Bogdan Vlasenko; David Philippou-Hübner; Dmytro Prylipko; Ronald Böck; Ingo Siegert; Andreas Wendemuth

Recently, automatic emotion recognition from speech has achieved growing interest within the human-machine interaction research community. Most part of emotion recognition methods use context independent frame-level analysis or turn-level analysis. In this article, we introduce context dependent vowel level analysis applied for emotion classification. An average first formant value extracted on vowel level has been used as unidimensional acoustic feature vector. The Neyman-Pearson criterion has been used for classification purpose. Our classifier is able to detect high-arousal emotions with small error rates. Within our research we proved that the smallest emotional unit should be the vowel instead of the word. We find out that using vowel level analysis can be an important issue during developing a robust emotion classifier. Also, our research can be useful for developing robust affective speech recognition methods and high quality emotional speech synthesis systems.

international conference on multimedia and expo | 2011

Appropriate emotional labelling of non-acted speech using basic emotions, geneva emotion wheel and self assessment manikins

Ingo Siegert; Ronald Böck; Bogdan Vlasenko; David Philippou-Hübner; Andreas Wendemuth

In emotion recognition from speech, a good transcription and annotation of given material is crucial. Moreover, the question of how to find good emotional labels for new data material is a basic issue. It is not only the question of which emotion labels to choose, it is also a matter of how labellers can cope with annotation methods. In this paper, we present our investigations for emotional labelling with three different methods (Basic Emotions, Geneva Emotion Wheel and Self Assessment Manikins) and compare them in terms of emotion coverage and usability. We show that emotion labels derived from Geneva Emotion Wheel or Self Assessment Manikins fulfill our requirements, but Basic Emotions are not feasible for emotion labelling from spontaneous speech.

MPRSS'12 Proceedings of the First international conference on Multimodal Pattern Recognition of Social Signals in Human-Computer-Interaction | 2012

Fusion of fragmentary classifier decisions for affective state recognition

Gerald Krell; Michael Glodek; Axel Panning; Ingo Siegert; Bernd Michaelis; Andreas Wendemuth; Friedhelm Schwenker

Real human-computer interaction systems based on different modalities face the problem that not all information channels are always available at regular time steps. Nevertheless an estimation of the current user state is required at anytime to enable the system to interact instantaneously based on the available modalities. A novel approach to decision fusion of fragmentary classifications is therefore proposed and empirically evaluated for audio and video signals of a corpus of non-acted user behavior. It is shown that visual and prosodic analysis successfully complement each other leading to an outstanding performance of the fusion architecture.

Journal on Multimodal User Interfaces | 2014

Analysis of significant dialog events in realistic human–computer interaction

Dmytro Prylipko; Dietmar F. Rösner; Ingo Siegert; Stephan Günther; Rafael Friesen; Matthias Haase; Bogdan Vlasenko; Andreas Wendemuth

This paper addresses issues of automatically detecting significant dialog events (SDEs) in naturalistic HCI, and of deducing trait-specific conclusions relevant for the design of spoken dialog systems. We perform our investigations on the multimodal LAST MINUTE corpus with records from naturalistic interactions. First, we used textual transcripts to analyse interaction styles and discourse structures. We found indications that younger subjects prefer a more technical style in communication with dialog systems. Next, we model the subject’s internal success state with a hidden Markov model trained using the observed sequences of system feedback. This reveals that younger subjects interact significantly more successful with technical systems. Aiming on automatic detection of specific subjects’s reactions, we then semi-automatically annotate SDEs—phrases indicating an irregular, i.e. not-task-oriented subject behavior. We use both acoustic and linguistic features to build several trait-specific classifiers for dialog phases, which showed pronouncedly different accuracies for diverse age and gender groups. The presented investigations coherently support age-dependence of both expressiveness and problem-solving ability. This in turn induces design rules for future automatic designated “companion” systems.

Cognitive Computation | 2014

Investigation of Speaker Group-Dependent Modelling for Recognition of Affective States from Speech

Ingo Siegert; David Philippou-Hübner; Kim Hartmann; Ronald Böck; Andreas Wendemuth

For successful human–machine-interaction (HCI) the pure textual information and the individual skills, preferences, and affective states of the user must be known. Therefore, as a starting point, the user’s actual affective state has to be recognized. In this work we investigated how additional knowledge, for example age and gender of the user, can be used to improve recognition of affective state. Two methods from automatic speech recognition are used to incorporate age and gender differences in recognition of affective state: speaker group-dependent (SGD) modelling and vocal tract length normalisation (VTLN). The investigations were performed on four corpora with acted and natural affected speech. Different features and two methods of classification (Gaussian mixture models (GMMs) and multi-layer perceptrons (MLPs)) were used. In addition, the effects of channel compensation and contextual characteristics were analysed. The results are compared with our own baseline results and with results reported in the literature. Two hypotheses were tested. First, incorporation of age information further improves speaker group-dependent modelling. Second, acoustic normalization does not achieve the same improvement as achieved by speaker group-dependent modelling, because the age and gender of a speaker affects the way emotions are expressed.

italian workshop on neural nets | 2014

Investigating the Form-Function-Relation of the Discourse Particle “hm” in a Naturalistic Human-Computer Interaction

Ingo Siegert; Dmytro Prylipko; Kim Hartmann; Ronald Böck; Andreas Wendemuth

For a successful speech-controlled human-computer interaction (HCI) the pure textual information as well as individual skills, preferences, and affective states of the user have to be known. However, verbal human interaction consists of several information layers. Apart from pure textual information, further details regarding the speaker’s feelings, believes, and social relations are transmitted. The additional information is encoded through acoustics. Especially, the intonation reveals details about the speakers communicative relation and their attitude towards the ongoing dialogue.

affective computing and intelligent interaction | 2013

Annotation and Classification of Changes of Involvement in Group Conversation

Ronald Böck; Stefan Glüge; Ingo Siegert; Andreas Wendemuth

The detection of involvement in a conversation is important to assess the level humans are participating in either a human-human or human-computer interaction. Especially, detecting changes in a groups involvement in a multi-party interaction is of interest to distinguish several constellations in the group itself. This information can further be used in situations where technical support of meetings is favoured, for instance, focusing a camera, switching microphones, etc. Moreover, this information could also help to improve the performance of technical systems applied in human-machine interaction. In this paper, we concentrate on video material given by the Table Talk corpus. Therefore, we introduce a way of annotating and classifying changes of involvement and discuss the reliability of the annotation. Further, we present classification results based on video features using Multi-Layer Networks.

international conference on human-computer interaction | 2014

Discourse Particles and User Characteristics in Naturalistic Human-Computer Interaction

Ingo Siegert; Matthias Haase; Dmytro Prylipko; Andreas Wendemuth

In human-human interaction (HHI) the behaviour of the speaker is amongst others characterised by semantic and prosodic cues. These short feedback signals minimally communicate certain dialogue functions such as attention, understanding or other attitudinal reactions. Human-computer interaction (HCI) systems have failed to note and respond to these details so far, resulting in users trying to cope with and adapt to the machines behaviour. In order to enhance HCI, an adaptation to the user’s behaviour, individual skills, and the integration of a general human behaviour understanding is indispensable. Another issue is the question if the usage of feedback signals is influenced by the user’s individuality. In this paper, we investigate the influence of specific feedback signals, known as discourse particles (DPs), with communication style and psychological characteristics within a naturalistic HCI. This investigation showed that there is a significant difference in the usage of DPs for users of certain user characteristics.

international conference on signal processing | 2012

Multimodal affect recognition in spontaneous HCI environment

Axel Panning; Ingo Siegert; Ayoub Al-Hamadi; Andreas Wendemuth; Dietmar F. Rösner; Jörg Frommer; Gerald Krell; Bernd Michaelis

Human Computer Interaction (HCI) is known to be a multimodal process. In this paper we will show results of experiments for affect recognition, with non-acted, affective multimodal data from the new Last Minute Corpus (LMC). This corpus is more related to real HCI applications than other known data sets where affective behavior is elicited untypically for HCI.We utilize features from three modalities: facial expressions, prosody and gesture. The results show, that even simple fusion architectures can reach respectable results compared to other approaches. Further we could show, that probably not all features and modalities contribute substantially to the classification process, where prosody and eye blink frequency seem most contributing in the analyzed dataset.

Explore More