Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Rainer Stiefelhagen is active.

Publication


Featured researches published by Rainer Stiefelhagen.


Eurasip Journal on Image and Video Processing | 2008

Evaluating multiple object tracking performance: the CLEAR MOT metrics

Keni Bernardin; Rainer Stiefelhagen

Simultaneous tracking of multiple persons in real-world environments is an active research field and several approaches have been proposed, based on a variety of features and algorithms. Recently, there has been a growing interest in organizing systematic evaluations to compare the various techniques. Unfortunately, the lack of common metrics for measuring the performance of multiple object trackers still makes it hard to compare their results. In this work, we introduce two intuitive and general metrics to allow for objective comparison of tracker characteristics, focusing on their precision in estimating object locations, their accuracy in recognizing object configurations and their ability to consistently label objects over time. These metrics have been extensively used in two large-scale international evaluations, the 2006 and 2007 CLEAR evaluations, to measure and compare the performance of multiple object trackers for a wide variety of tracking tasks. Selected performance results are presented and the advantages and drawbacks of the presented metrics are discussed based on the experience gained during the evaluations.


Second International Workshop (MLMI 2005) | 2006

Machine Learning for Multimodal Interaction

Rainer Stiefelhagen

Invited Paper.- Robust Real Time Face Tracking for the Analysis of Human Behaviour.- Multimodal Processing.- Conditional Sequence Model for Context-Based Recognition of Gaze Aversion.- Meeting State Recognition from Visual and Aural Labels.- Object Category Recognition Using Probabilistic Fusion of Speech and Image Classifiers.- HCI, User Studies and Applications.- Automatic Annotation of Dialogue Structure from Simple User Interaction.- Interactive Pattern Recognition.- User Specific Training of a Music Search Engine.- An Ego-Centric and Tangible Approach to Meeting Indexing and Browsing.- Integrating Semantics into Multimodal Interaction Patterns.- Towards an Objective Test for Meeting Browsers: The BET4TQB Pilot Experiment.- Image and Video Processing.- Face Recognition in Smart Rooms.- Gaussian Process Latent Variable Models for Human Pose Estimation.- Discourse and Dialogue Processing.- Automatic Labeling Inconsistencies Detection and Correction for Sentence Unit Segmentation in Conversational Speech.- Term-Weighting for Summarization of Multi-party Spoken Dialogues.- Automatic Decision Detection in Meeting Speech.- Czech Text-to-Sign Speech Synthesizer.- Speech and Audio Processing.- Using Prosodic Features in Language Models for Meetings.- Posterior-Based Features and Distances in Template Matching for Speech Recognition.- A Study of Phoneme and Grapheme Based Context-Dependent ASR Systems.- Transfer Learning for Tandem ASR Feature Extraction.- Spoken Term Detection System Based on Combination of LVCSR and Phonetic Search.- Frequency Domain Linear Prediction for QMF Sub-bands and Applications to Audio Coding.- Modeling Vocal Interaction for Segmentation in Meeting Recognition.- Binaural Speech Separation Using Recurrent Timing Neural Networks for Joint F0-Localisation Estimation.- PASCAL Speech Separation Challenge II.- To Separate Speech.- Microphone Array Beamforming Approach to Blind Speech Separation.


human factors in computing systems | 2002

Head orientation and gaze direction in meetings

Rainer Stiefelhagen; Jie Zhu

Detecting who is looking at whom during multiparty interaction is useful for various tasks such as meeting analysis. There are two contributing factors in the formation of where a person is looking at : head orientation and eye orientation. In this poster, we present an experiment aimed at evaluating the potential of head orientation estimation in detecting who is looking at whom, because head orientation can be estimated accurately and robustly with non-intrusive methods while eye orientation can not. Experimental results show that head orientation contributes 68.9% on average to the overall gaze direction, and focus of attention estimation based on head orientation alone can get an average accuracy of 88.7% in a meeting application scenario with four participants. We conclude that head orientation is a good indicator of focus of attention in human computer interaction applications.


international conference on multimodal interfaces | 2003

Pointing gesture recognition based on 3D-tracking of face, hands and head orientation

Kai Nickel; Rainer Stiefelhagen

In this paper, we present a system capable of visually detecting pointing gestures and estimating the 3D pointing direction in real-time. In order to acquire input features for gesture recognition, we track the positions of a persons face and hands on image sequences provided by a stereo-camera. Hidden Markov Models (HMMs), trained on different phases of sample pointing gestures, are used to classify the 3D-trajectories in order to detect the occurrence of a gesture. When analyzing sample pointing gestures, we noticed that humans tend to look at the pointing target while performing the gesture. In order to utilize this behavior, we additionally measured head orientation by means of a magnetic sensor in a similar scenario. By using head orientation as an additional feature, we observed significant gains in both recall and precision of pointing gestures. Moreover, the percentage of correctly identified pointing targets improved significantly from 65% to 83%. For estimating the pointing direction, we comparatively used three approaches: 1) The line of sight between head and hand, 2) the forearm orientation, and 3) the head orientation.


international conference on multimodal interfaces | 2004

Identifying the addressee in human-human-robot interactions based on head pose and speech

Michael Katzenmaier; Rainer Stiefelhagen; Tanja Schultz

In this work we investigate the power of acoustic and visual cues, and their combination, to identify the addressee in a human-human-robot interaction. Based on eighteen audio-visual recordings of two human beings and a (simulated) robot we discriminate the interaction of the two humans from the interaction of one human with the robot. The paper compares the result of three approaches. The first approach uses purely acoustic cues to find the addressees. Low level, feature based cues as well as higher-level cues are examined. In the second approach we test whether the humans head pose is a suitable cue. Our results show that visually estimated head pose is a more reliable cue for the identification of the addressee in the human-human-robot interaction. In the third approach we combine the acoustic and visual cues which results in significant improvements.


intelligent robots and systems | 2004

Natural human-robot interaction using speech, head pose and gestures

Rainer Stiefelhagen; Christian Fügen; R. Gieselmann; Hartwig Holzapfel; Kai Nickel; Alex Waibel

In this paper we present our ongoing work in building technologies for natural multimodal human-robot interaction. We present our systems for spontaneous speech recognition, multimodal dialogue processing and visual perception of a user, which includes the recognition of pointing gestures as well as the recognition of a persons head orientation. Each of the components is described in the paper and experimental results are presented. In order to demonstrate and measure the usefulness of such technologies for human-robot interaction, all components have been integrated on a mobile robot platform and have been used for real-time human-robot interaction in a kitchen scenario.


IEEE Transactions on Robotics | 2007

Enabling Multimodal Human–Robot Interaction for the Karlsruhe Humanoid Robot

Rainer Stiefelhagen; Hazim Kemal Ekenel; Christian Fügen; Petra Gieselmann; Hartwig Holzapfel; Florian Kraft; Kai Nickel; Michael Voit; Alex Waibel

In this paper, we present our work in building technologies for natural multimodal human-robot interaction. We present our systems for spontaneous speech recognition, multimodal dialogue processing, and visual perception of a user, which includes localization, tracking, and identification of the user, recognition of pointing gestures, as well as the recognition of a persons head orientation. Each of the components is described in the paper and experimental results are presented. We also present several experiments on multimodal human-robot interaction, such as interaction using speech and gestures, the automatic determination of the addressee during human-human-robot interaction, as well on interactive learning of dialogue strategies. The work and the components presented here constitute the core building blocks for audiovisual perception of humans and multimodal human-robot interaction used for the humanoid robot developed within the German research project (Sonderforschungsbereich) on humanoid cooperative robots.


ieee international conference on automatic face gesture recognition | 2004

Head pose estimation using stereo vision for human-robot interaction

Edgar Seemann; Kai Nickel; Rainer Stiefelhagen

We present a method for estimating a persons head pose with a stereo camera. Our approach focuses on the application of human-robot interaction, where people may be further away from the camera and move freely around in a room. We show that depth information acquired from a stereo camera not only helps improving the accuracy of the pose estimation, but also improves the robustness of the system when the lighting conditions change. The estimation is based on neural networks, which are trained to compute the head pose from grayscale and disparity images of the stereo camera. It can handle pan and tilt rotations from -90/spl deg/ to +90/spl deg/. Our system does not require any manual initialization and does not suffer from drift during an image sequence. Moreover the system is capable of real-time processing.


CLEaR | 2006

The CLEAR 2006 evaluation

Rainer Stiefelhagen; Keni Bernardin; Rachel Bowers; John S. Garofolo; Djamel Mostefa; Padmanabhan Soundararajan

This paper is a summary of the first CLEAR evaluation on CLassification of Events, Activities and Relationships - which took place in early 2006 and concluded with a two day evaluation workshop in April 2006. CLEAR is an international effort to evaluate systems for the multimodal perception of people, their activities and interactions. It provides a new international evaluation framework for such technologies. It aims to support the definition of common evaluation tasks and metrics, to coordinate and leverage the production of necessary multimodal corpora and to provide a possibility for comparing different algorithms and approaches on common benchmarks, which will result in faster progress in the research community. This paper describes the evaluation tasks, including metrics and databases used, that were conducted in CLEAR 2006, and provides an overview of the results. The evaluation tasks in CLEAR 2006 included person tracking, face detection and tracking, person identification, head pose estimation, vehicle tracking as well as acoustic scene analysis. Overall, more than 20 subtasks were conducted, which included acoustic, visual and audio-visual analysis for many of the main tasks, as well as different data domains and evaluation conditions.


international conference on computer vision | 2007

Video-based Face Recognition on Real-World Data

Johannes Stallkamp; Hazim Kemal Ekenel; Rainer Stiefelhagen

In this paper, we present the classification sub-system of a real-time video-based face identification system which recognizes people entering through the door of a laboratory. Since the subjects are not asked to cooperate with the system but are allowed to behave naturally, this application scenario poses many challenges. Continuous, uncontrolled variations of facial appearance due to illumination, pose, expression, and occlusion need to be handled to allow for successful recognition. Faces are classified by a local appearance-based face recognition algorithm. The obtained confidence scores from each classification are progressively combined to provide the identity estimate of the entire sequence. We introduce three different measures to weight the contribution of each individual frame to the overall classification decision. They are distance- to-model (DTM), distance-to-second-closest (DT2ND), and their combination. Both a k-nearest neighbor approach and a set of Gaussian mixtures are evaluated to produce individual frame scores. We have conducted closed set and open set identification experiments on a database of 41 subjects. The experimental results show that the proposed system is able to reach high correct recognition rates in a difficult scenario.

Collaboration


Dive into the Rainer Stiefelhagen's collaboration.

Top Co-Authors

Avatar

Hazim Kemal Ekenel

Istanbul Technical University

View shared research outputs
Top Co-Authors

Avatar

Alex Waibel

Karlsruhe Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Kai Nickel

Karlsruhe Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Boris Schauerte

Karlsruhe Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Keni Bernardin

Karlsruhe Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Michael Voit

Karlsruhe Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Jie Yang

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Makarand Tapaswi

Karlsruhe Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Florian van de Camp

Indian Institute of Technology Bombay

View shared research outputs
Top Co-Authors

Avatar

Martin Bäuml

Karlsruhe Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge