Markus Kächele
University of Ulm
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Markus Kächele.
affective computing and intelligent interaction | 2011
Michael Glodek; Stephan Tschechne; Georg Layher; Martin Schels; Tobias Brosch; Stefan Scherer; Markus Kächele; Miriam Schmidt; Heiko Neumann; Günther Palm; Friedhelm Schwenker
Research activities in the field of human-computer interaction increasingly addressed the aspect of integrating some type of emotional intelligence. Human emotions are expressed through different modalities such as speech, facial expressions, hand or body gestures, and therefore the classification of human emotions should be considered as a multimodal pattern recognition problem. The aim of our paper is to investigate multiple classifier systems utilizing audio and visual features to classify human emotional states. For that a variety of features have been derived. From the audio signal the fundamental frequency, LPCand MFCC coefficients, and RASTA-PLP have been used. In addition to that two types of visual features have been computed, namely form and motion features of intermediate complexity. The numerical evaluation has been performed on the four emotional labels Arousal, Expectancy, Power, Valence as defined in the AVEC data set. As classifier architectures multiple classifier systems are applied, these have been proven to be accurate and robust against missing and noisy data.
international conference on pattern recognition applications and methods | 2014
Markus Kächele; Michael Glodek; Dimitrij Zharkov; Sascha Meudt; Friedhelm Schwenker
Reliable prediction of affective states in real world scenarios is very challenging and a significant amount of ongoing research is targeted towards improvement of existing systems. Major problems include the unreliability of labels, different realizations of the same affective states amongst different persons and in different modalities as well as the presence of sensor noise in the signals. This work presents a framework for adaptive fusion of input modalities with variable degrees of certainty on different levels. Using a strategy that starts with ensembles of weak learners, gradually, level by level, the discriminative power of the system is improved by adaptively weighting favorable decisions, while concurrently dismissing unfavorable ones. For the final decision fusion the proposed system leverages a trained Kalman filter. Besides its ability to deal with missing and uncertain values, in its nature, the Kalman filter is a time series predictor and thus a suitable choice to match input signals to a reference time series in the form of ground truth labels.
Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge | 2014
Markus Kächele; Martin Schels; Friedhelm Schwenker
This paper outlines our contribution to the 2014 edition of the AVEC competition. It comprises classification results and considerations for both the continuous affect recognition sub-challenge and also the depression recognition sub-challenge. Rather than relying on statistical features that are normally extracted from the raw audio-visual data we propose an approach based on abstract meta information about individual subjects and also prototypical task and label dependent templates to infer the respective emotional states. The results of the approach that were submitted to both parts of the challenge significantly outperformed the baseline approaches. Further, we elaborate on several issues about the labeling of affective corpora and the choice of appropriate performance measures.
Journal on Multimodal User Interfaces | 2014
Martin Schels; Markus Kächele; Michael Glodek; David Hrabal; Steffen Walter; Friedhelm Schwenker
The individual nature of physiological measurements of human affective states makes it very difficult to transfer statistical classifiers from one subject to another. In this work, we propose an approach to incorporate unlabeled data into a supervised classifier training in order to conduct an emotion classification. The key idea of the method is to conduct a density estimation of all available data (labeled and unlabeled) to create a new encoding of the problem. Based on this a supervised classifier is constructed. Further, numerical evaluations on the EmoRec II corpus are given, examining to what extent additional data can improve classification and which parameters of the density estimation are optimal.
international conference on multimodal interfaces | 2013
Sascha Meudt; Dimitri Zharkov; Markus Kächele; Friedhelm Schwenker
Systems for the recognition of psychological characteristics such as the emotional state in real world scenarios have to deal with several difficulties. Amongst those are unconstrained environments and uncertainties in one or several input channels. However a more crucial aspect is the content of the data itself. Psychological states are highly person-dependent and often even humans are not able to determine the correct state a person is in. A successful recognition system thus has to deal with data, that is not very discriminative and often simply misleading. In order to succeed, a critical view on features and decisions is essential to select only the most valuable ones. This work presents a comparison of a common multi classifier system approach based on state of the art features and a modified forward backward feature selection algorithm with a long term stopping criteria. The second approach takes also features of the voice quality family into account. Both approaches are based on the audio modality only. The dataset used in the challenge is an in between dataset of real world datasets which are still very hard to handle and over acted datasets which were famous in the past and today are well understood.
international conference on pattern recognition | 2014
Markus Kächele; Friedhelm Schwenker
Emotion recognition from facial expressions is a highly demanding task, especially in everyday life scenarios. Different sources of artifacts have to be considered in order to successfully extract the intended emotional nuances of the face. The exact and robust detection and orientation of faces impeded by occlusions, inhomogeneous lighting and fast movements is only one difficulty. Another one is the question of selecting suitable features for the application at hand. In the literature, a vast body of different visual features grouped into dynamic, spatial and textural families, has been proposed. These features exhibit different advantages/disadvantages over each other due to their inherent structure, and thus capture complementary information, which is a promising vantage point for fusion architectures. To combine different feature sets and exploit their respective advantages, an adaptive multilevel fusion architecture is proposed. The cascaded approach integrates information on different levels and time scales using artificial neural networks for adaptive weighting of propagated intermediate results. The performance of the proposed architecture is analysed on the GEMEP-FERA corpus as well as on a novel dataset obtained from an unconstrained, spontaneuous human-computer interaction scenario. The obtained performance is superior to single channels and basic fusion techniques.
international conference on pattern recognition | 2014
Markus Kächele; Dimitrij Zharkov; Sascha Meudt; Friedhelm Schwenker
Emotion recognition from speech is an important field of research in human-machine-interfaces, and has begun to influence everyday life by employment in different areas such as call centers or wearable companions in the form of smartphones. In the proposed classification architecture, different spectral, prosodic and the relatively novel voice quality features are extracted from the speech signals. These features are then used to represent long-term information of the speech, leading to utterance-wise suprasegmental features. The most promising of these features are selected using a forward-selection/backward-elimination algorithm with a novel long-term termination criterion for the selection. The overall system has been evaluated using recordings from the public Berlin emotion database. Utilizing the resulted features, a recognition rate of 88,97% has been achieved which surpasses the performance of humans on this database and is comparable to the state of the art performance on this dataset.
multiple classifier systems | 2015
Markus Kächele; Philipp Werner; Ayoub Al-Hamadi; Günther Palm; Steffen Walter; Friedhelm Schwenker
In this work, multi-modal fusion of video and biopotential signals is used to recognize pain in a person-independent scenario. For this purpose, participants were subjected to painful heat stimuli under controlled conditions. Subsequently, a multitude of features have been extracted from the available modalities. Experimental validation suggests that the cues that allow the successful recognition of pain are highly similar across different people and complementary in the analysed modalities to an extent that fusion methods are able to achieve an improvement over single modalities. Different fusion approaches (early, late, trainable) are compared on a large set of state-of-the art features for the biopotentials and video channels in multiple classification experiments.
acm multimedia | 2015
Markus Kächele; Patrick Thiam; Günther Palm; Friedhelm Schwenker; Martin Schels
In this paper we present a multi-modal system based on audio, video and bio-physiological features for continuous recognition of human affect in unconstrained scenarios. We leverage the robustness of ensemble classifiers as base learners and refine the predictions using stochastic gradient descent based optimization on the desired loss function. Furthermore we provide a discussion about pre- and post-processing steps that help to improve the robustness of the regression and subsequently the prediction quality.
Journal on Multimodal User Interfaces | 2016
Markus Kächele; Martin Schels; Sascha Meudt; Günther Palm; Friedhelm Schwenker
The focus of this work is emotion recognition in the wild based on a multitude of different audio, visual and meta features. For this, a method is proposed to optimize multi-modal fusion architectures based on evolutionary computing. Extensive uni- and multi-modal experiments show the discriminative power of each computed feature set and fusion architecture. Furthermore, we summarize the EmotiW 2013/2014 challenges and review the conclusions that have been drawn and compare our results with the state-of-the-art on this dataset.