Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Kai Nickel is active.

Publication


Featured researches published by Kai Nickel.


international conference on multimodal interfaces | 2003

Pointing gesture recognition based on 3D-tracking of face, hands and head orientation

Kai Nickel; Rainer Stiefelhagen

In this paper, we present a system capable of visually detecting pointing gestures and estimating the 3D pointing direction in real-time. In order to acquire input features for gesture recognition, we track the positions of a persons face and hands on image sequences provided by a stereo-camera. Hidden Markov Models (HMMs), trained on different phases of sample pointing gestures, are used to classify the 3D-trajectories in order to detect the occurrence of a gesture. When analyzing sample pointing gestures, we noticed that humans tend to look at the pointing target while performing the gesture. In order to utilize this behavior, we additionally measured head orientation by means of a magnetic sensor in a similar scenario. By using head orientation as an additional feature, we observed significant gains in both recall and precision of pointing gestures. Moreover, the percentage of correctly identified pointing targets improved significantly from 65% to 83%. For estimating the pointing direction, we comparatively used three approaches: 1) The line of sight between head and hand, 2) the forearm orientation, and 3) the head orientation.


intelligent robots and systems | 2004

Natural human-robot interaction using speech, head pose and gestures

Rainer Stiefelhagen; Christian Fügen; R. Gieselmann; Hartwig Holzapfel; Kai Nickel; Alex Waibel

In this paper we present our ongoing work in building technologies for natural multimodal human-robot interaction. We present our systems for spontaneous speech recognition, multimodal dialogue processing and visual perception of a user, which includes the recognition of pointing gestures as well as the recognition of a persons head orientation. Each of the components is described in the paper and experimental results are presented. In order to demonstrate and measure the usefulness of such technologies for human-robot interaction, all components have been integrated on a mobile robot platform and have been used for real-time human-robot interaction in a kitchen scenario.


IEEE Transactions on Robotics | 2007

Enabling Multimodal Human–Robot Interaction for the Karlsruhe Humanoid Robot

Rainer Stiefelhagen; Hazim Kemal Ekenel; Christian Fügen; Petra Gieselmann; Hartwig Holzapfel; Florian Kraft; Kai Nickel; Michael Voit; Alex Waibel

In this paper, we present our work in building technologies for natural multimodal human-robot interaction. We present our systems for spontaneous speech recognition, multimodal dialogue processing, and visual perception of a user, which includes localization, tracking, and identification of the user, recognition of pointing gestures, as well as the recognition of a persons head orientation. Each of the components is described in the paper and experimental results are presented. We also present several experiments on multimodal human-robot interaction, such as interaction using speech and gestures, the automatic determination of the addressee during human-human-robot interaction, as well on interactive learning of dialogue strategies. The work and the components presented here constitute the core building blocks for audiovisual perception of humans and multimodal human-robot interaction used for the humanoid robot developed within the German research project (Sonderforschungsbereich) on humanoid cooperative robots.


ieee international conference on automatic face gesture recognition | 2004

Head pose estimation using stereo vision for human-robot interaction

Edgar Seemann; Kai Nickel; Rainer Stiefelhagen

We present a method for estimating a persons head pose with a stereo camera. Our approach focuses on the application of human-robot interaction, where people may be further away from the camera and move freely around in a room. We show that depth information acquired from a stereo camera not only helps improving the accuracy of the pose estimation, but also improves the robustness of the system when the lighting conditions change. The estimation is based on neural networks, which are trained to compute the head pose from grayscale and disparity images of the stereo camera. It can handle pan and tilt rotations from -90/spl deg/ to +90/spl deg/. Our system does not require any manual initialization and does not suffer from drift during an image sequence. Moreover the system is capable of real-time processing.


international conference on multimodal interfaces | 2004

Implementation and evaluation of a constraint-based multimodal fusion system for speech and 3D pointing gestures

Hartwig Holzapfel; Kai Nickel; Rainer Stiefelhagen

This paper presents an architecture for fusion of multimodal input streams for natural interaction with a humanoid robot as well as results from a user study with our system. The presented fusion architecture consists of an application independent parser of input events, and application specific rules. In the presented user study, people could interact with a robot in a kitchen scenario, using speech and gesture input. In the study, we could observe that our fusion approach is very tolerant against falsely detected pointing gestures. This is because we use speech as the main modality and pointing gestures mainly for disambiguation of objects. In the paper we also report about the temporal correlation of speech and gesture events as observed in the user study.


computer vision and pattern recognition | 2006

Tracking of the Articulated Upper Body on Multi-View Stereo Image Sequences

Julius Ziegler; Kai Nickel; Rainer Stiefelhagen

We propose a novel method for tracking an articulated model in a 3D-point cloud. The tracking problem is formulated as the registration of two point sets, one of them parameterised by the model’s state vector and the other acquired from a 3D-sensor system. Finding the correct parameter vector is posed as a linear estimation problem, which is solved by means of a scaled unscented Kalman filter. Our method draws on concepts from the widely used iterative closest point registration algorithm (ICP), basing the measurement model on point correspondences established between the synthesised model point cloud and the measured 3D-data. We apply the algorithm to kinematically track a model of the human upper body on a point cloud obtained through stereo image processing from one or more stereo cameras. We determine torso position and orientation as well as joint angles of shoulders and elbows. The algorithm has been successfully tested on thousands of frames of real image data. Challenging sequences of several minutes length where tracked correctly. Complete processing time remains below one second per frame.


workshop on applications of signal processing to audio and acoustics | 2005

Kalman filters for audio-video source localization

Tobias Gehrig; Kai Nickel; Hazim Kemal Ekenel; Ulrich Klee; John W. McDonough

In prior work, we proposed using an extended Kalman filter to directly update position estimates in a speaker localization system based on time delays of arrival. We found that such a scheme provided superior tracking quality as compared with the conventional closed-form approximation methods. In this work, we enhance our audio localizer with video information. We propose an algorithm to incorporate detected face positions in different camera views into the Kalman filter without doing any explicit triangulation. This approach yields a robust source localizer that functions reliably both for segments wherein the speaker is silent, which would be detrimental for an audio only tracker, and wherein many faces appear, which would confuse a video only tracker. We tested our algorithm on a data set consisting of seminars held by actual speakers. Our experiments revealed that the audio-video localizer functioned better than a localizer based solely on audio or solely on video features.


international conference on multisensor fusion and integration for intelligent systems | 2006

Activity Recognition and Room-Level Tracking in an Office Environment

Christian Wojek; Kai Nickel; Rainer Stiefelhagen

We present an approach for multi-person activity recognition in an office environment with simultaneous tracking of users on the room-level. Audio as well as video features, gathered from a simple setup, are used to employ a multilevel hidden Markov model (HMM) framework. Evaluation on unconstrained real world data recorded on several days in five offices with one camera and one microphone per room is presented for activity recognition. We track the users by a distributed camera network which has to cope with blind gaps between different camera views. For location estimation, we apply a Bayesian filter on top of the activity recognition results. Results on a dedicated tracking sequence of one hour length show the algorithms performance


european conference on computer vision | 2004

Real-Time Person Tracking and Pointing Gesture Recognition for Human-Robot Interaction

Kai Nickel; Rainer Stiefelhagen

In this paper, we present our approach for visual tracking of head, hands and head orientation. Given the images provided by a calibrated stereo-camera, color and disparity information are integrated into a multi-hypotheses tracking framework in order to find the 3D-positions of the respective body parts. Based on the hands’ motion, an HMM-based approach is applied to recognize pointing gestures. We show experimentally, that the gesture recognition performance can be improved significantly by using visually gained information about head orientation as an additional feature. Our system aims at applications in the field of human-robot interaction, where it is important to do run-on recognition in real-time, to allow for robot’s egomotion and not to rely on manual initialization.


Signal Processing | 2006

Audio-visual perception of a lecturer in a smart seminar room

Rainer Stiefelhagen; Keni Bernardin; Hazim Kemal Ekenel; John W. McDonough; Kai Nickel; Michael Voit; Matthias Wölfel

In this paper we present our work on audio-visual perception of a lecturer in a smart seminar room, which is equipped with various cameras and microphones. We present a novel approach to track the lecturer based on visual and acoustic observations in a particle filter framework. This approach does not require explicit triangulation of observations in order to estimate the 3D location of the lecturer, thus allowing for fast audio-visual tracking. We also show how automatic recognition of the lecturers speech from far-field microphones can be improved using his or her tracked location in the room. Based on the tracked location of the lecturer, we can also detect his or her face in the various camera views for further analysis, such as his or her head orientation and identity. The paper describes the overall system and the various components (tracking, speech recognition, head orientation, identification) in detail and presents results on several multimodal recordings of seminars.

Collaboration


Dive into the Kai Nickel's collaboration.

Top Co-Authors

Avatar

Rainer Stiefelhagen

Karlsruhe Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Michael Voit

Karlsruhe Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

John W. McDonough

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Hartwig Holzapfel

Karlsruhe Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Hazim Kemal Ekenel

Istanbul Technical University

View shared research outputs
Top Co-Authors

Avatar

Alex Waibel

Karlsruhe Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Matthias Wölfel

Karlsruhe Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Petra Gieselmann

Karlsruhe Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Tobias Gehrig

Karlsruhe Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Christian Fügen

Karlsruhe Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge