Miriam Schmidt
University of Ulm
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Miriam Schmidt.
affective computing and intelligent interaction | 2011
Michael Glodek; Stephan Tschechne; Georg Layher; Martin Schels; Tobias Brosch; Stefan Scherer; Markus Kächele; Miriam Schmidt; Heiko Neumann; Günther Palm; Friedhelm Schwenker
Research activities in the field of human-computer interaction increasingly addressed the aspect of integrating some type of emotional intelligence. Human emotions are expressed through different modalities such as speech, facial expressions, hand or body gestures, and therefore the classification of human emotions should be considered as a multimodal pattern recognition problem. The aim of our paper is to investigate multiple classifier systems utilizing audio and visual features to classify human emotional states. For that a variety of features have been derived. From the audio signal the fundamental frequency, LPCand MFCC coefficients, and RASTA-PLP have been used. In addition to that two types of visual features have been computed, namely form and motion features of intermediate complexity. The numerical evaluation has been performed on the four emotional labels Arousal, Expectancy, Power, Valence as defined in the AVEC data set. As classifier architectures multiple classifier systems are applied, these have been proven to be accurate and robust against missing and noisy data.
international conference on human computer interaction | 2011
Steffen Walter; Stefan Scherer; Martin Schels; Michael Glodek; David Hrabal; Miriam Schmidt; Ronald Böck; Kerstin Limbrecht; Harald C. Traue; Friedhelm Schwenker
The design of intelligent personalized interactive systems, having knowledge about the users state, his desires, needs and wishes, currently poses a great challenge to computer scientists. In this study we propose an information fusion approach combining acoustic, and biophysiological data, comprising multiple sensors, to classify emotional states. For this purpose a multimodal corpus has been created, where subjects undergo a controlled emotion eliciting experiment, passing several octants of the valence arousal dominance space. The temporal and decision level fusion of the multiple modalities outperforms the single modality classifiers and shows promising results.
artificial neural networks in pattern recognition | 2010
Miriam Schmidt; Martin Schels; Friedhelm Schwenker
One of the important properties of hidden Markov models is the ability to model sequential dependencies. In this study the applicability of hidden Markov models for emotion recognition in image sequences is investigated, i.e. the temporal aspects of facial expressions. The underlying image sequences were taken from the Cohn-Kanade database. Three different features (principal component analysis, orientation histograms and optical flow estimation) from four facial regions of interest (face, mouth, right and left eye) were extracted. The resulting twelve paired combinations of feature and region were used to evaluate hidden Markov models. The best single model with features of principal component analysis in the region face achieved a detection rate of 76.4 %. To improve these results further, two different fusion approaches were evaluated. Thus, the best fusion detection rate in this study was 86.1 %.
international conference on multiple classifier systems | 2010
Friedhelm Schwenker; Stefan Scherer; Miriam Schmidt; Martin Schels; Michael Glodek
Research in the area of human-computer interaction (HCI) increasingly addressed the aspect of integrating some type of emotional intelligence in the system. Such systems must be able to recognize, interprete and create emotions. Although, human emotions are expressed through different modalities such as speech, facial expressions, hand or body gestures, most of the research in affective computing has been done in unimodal emotion recognition. Basically, a multimodal approach to emotion recognition should be more accurate and robust against missing or noisy data. We consider multiple classifier systems in this study for the classification of facial expressions, and additionally present a prototype of an audio-visual laughter detection system. Finally, a novel implementation of a Java process engine for pattern recognition and information fusion is described.
international conference on multimodal interfaces | 2014
Felix Schüssel; Frank Honold; Miriam Schmidt; Nikola Bubalo; Anke Huckauf; Michael Weber
Multimodal systems still tend to ignore the individual input behavior of users, and at the same time, suffer from erroneous sensor inputs. Although many researchers have described user behavior in specific settings and tasks, little to nothing is known about the applicability of such information, when it comes to increase the robustness of a system for multimodal inputs. We conducted a gamified experimental study to investigate individual user behavior and error types found in an actually running system. It is shown, that previous ways of describing input behavior by a simple classification scheme (like simultaneous and sequential) are not suited to build up an individual interaction history. Instead, we propose to use temporal distributions of different metrics derived from multimodal event timings. We identify the major errors that can occur in multimodal interactions and finally show how such an interaction history can practically be applied for error detection and recovery. Applying the proposed approach to the experimental data, the initial error rate is reduced from 4.9% to a minimum of 1.2%.
artificial neural networks in pattern recognition | 2014
Friedhelm Schwenker; Markus Frey; Michael Glodek; Markus Kächele; Sascha Meudt; Martin Schels; Miriam Schmidt
In this paper a novel approach to fuzzy support vector machines (SVM) in multi-class classification problems is presented. The proposed algorithm has the property to benefit from fuzzy labeled data in the training phase and can determine fuzzy memberships for input data. The algorithm can be considered as an extension of the traditional multi-class SVM for crisp labeled data, and it also extents the fuzzy SVM approach for fuzzy labeled training data in the two-class classification setting. Its behavior is demonstrated on three benchmark data sets, the achieved results motivate the inclusion of fuzzy labeled data into the training set for various tasks in pattern recognition and machine learning, such as the design of aggregation rules in multiple classifier systems, or in partially supervised learning.
GbRPR'11 Proceedings of the 8th international conference on Graph-based representations in pattern recognition | 2011
Miriam Schmidt; Friedhelm Schwenker
In this paper, the classification of human activities based on sequences of camera images utilizing hidden Markov models is investigated. In the first step of the proposed data processing procedure, the locations of the persons body parts (hand, head, etc.) and objects (table, cup, etc.) which are relevant for the classification of the persons activity have to be estimated for each camera image. In the next processing step, the distances between all pairs of detected objects are computed and the eigenvalues of this Euclidean distance matrix are calculated. This set of eigenvalues built the input for a single camera image and serve as the inputs to Gaussian mixture models, which are utilized to estimate the emission probabilities of hidden Markov models. It could be demonstrated, that the eigenvalues are powerful features, which are invariant with respect to the labeling of the nodes (if they are utilized sorted by size) and can also deal with graphs, which differ in the number of their nodes.
Companion Technology | 2017
Ingo Siegert; Felix Schüssel; Miriam Schmidt; Stephan Reuter; Sascha Meudt; Georg Layher; Gerald Krell; Thilo Hörnle; Sebastian Handrich; Ayoub Al-Hamadi; Klaus Dietmayer; Heiko Neumann; Günther Palm; Friedhelm Schwenker; Andreas Wendemuth
We demonstrate a successful multimodal dynamic human-computer interaction (HCI) in which the system adapts to the current situation and the user’s state is provided using the scenario of purchasing a train ticket. This scenario demonstrates that Companion Systems are facing the challenge of analyzing and interpreting explicit and implicit observations obtained from sensors under changing environmental conditions. In a dedicated experimental setup, a wide range of sensors was used to capture the situative context and the user, comprising video and audio capturing devices, laser scanners, a touch screen, and a depth sensor. Explicit signals describe a user’s direct interaction with the system, such as interaction gestures, speech and touch input. Implicit signals are not directly addressed to the system; they comprise the user’s situative context, his or her gesture, speech, body pose, facial expressions and prosody. Both multimodally fused explicit signals and interpreted information from implicit signals steer the application component, which was kept deliberately robust. The application offers stepwise dialogs gathering the most relevant information for purchasing a train ticket, where the dialog steps are sensitive and adaptable within the processing time to the interpreted signals and data. We further highlight the system’s potential for a fast-track ticket purchase when several pieces of information indicate a hurried user.
artificial neural networks in pattern recognition | 2012
Miriam Schmidt; Günther Palm; Friedhelm Schwenker
In this paper, the classification power of the eigenvalues of six graph-associated matrices is investigated and evaluated on a benchmark dataset for optical character recognition. The extracted eigenvalues were utilized as feature vectors for multi-class classification using support vector machines. Each graph-associated matrix contains a certain type of geometric/spacial information, which may be important for the classification process. Classification results are presented for all six feature types, as well as for classifier combinations at decision level. For the decision level combination probabilistic output support vector machines have been applied. The eigenvalues of the weighted adjacency matrix provided the best classification rate of 89.9 %. Here, almost half of the misclassified letters are confusion pairs, such as I-L and N-Z. This classification performance can be increased by decision fusion, using the sum rule, to 92.4 %.
Journal on Multimodal User Interfaces | 2012
Stefan Scherer; Michael Glodek; Georg Layher; Martin Schels; Miriam Schmidt; Tobias Brosch; Stephan Tschechne; Friedhelm Schwenker; Heiko Neumann; Günther Palm