Martin Haker | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Martin Haker is active.

Explore More

Publication

Featured researches published by Martin Haker.

international symposium on signals, circuits and systems | 2007

Geometric Invariants for Facial Feature Tracking with 3D TOF Cameras

Martin Haker; Martin Böhme; Thomas Martinetz; Erhardt Barth

This paper presents a very simple feature-based nose detector in combined range and amplitude data obtained by a 3D time-of-flight camera. The robust localization of image attributes, such as the nose, can be used for accurate object tracking. We use geometric features that are related to the intrinsic dimensionality of surfaces. To find a nose in the image, the features are computed per pixel; pixels whose feature values lie inside a certain bounding box in feature space are classified as nose pixels, and all other pixels are classified as non-nose pixels. The extent of the bounding box is learned on a labeled training set. Despite its simplicity this procedure generalizes well, that is, a bounding box determined for one group of subjects accurately detects noses of other subjects. The performance of the detector is demonstrated by robustly identifying the nose of a person in a wide range of head orientations. An important result is that the combination of both range and amplitude data dramatically improves the accuracy in comparison to the use of a single type of data. This is reflected in the equal error rates (EER) obtained on a database of head poses. Using only the range data, we detect noses with an EER of 0.66. Results on the amplitude data are slightly better with an EER of 0.42. The combination of both types of data yields a substantially improved EER of 0.03.

Dyn3D '09 Proceedings of the DAGM 2009 Workshop on Dynamic 3D Imaging | 2009

Self-Organizing Maps for Pose Estimation with a Time-of-Flight Camera

Martin Haker; Martin Böhme; Thomas Martinetz; Erhardt Barth

We describe a technique for estimating human pose from an image sequence captured by a time-of-flight camera. The pose estimation is derived from a simple model of the human body that we fit to the data in 3D space. The model is represented by a graph consisting of 44 vertices for the upper torso, head, and arms. The anatomy of these body parts is encoded by the edges, i.e. an arm is represented by a chain of pairwise connected vertices whereas the torso consists of a 2-dimensional grid. The model can easily be extended to the representation of legs by adding further chains of pairwise connected vertices to the lower torso. The model is fit to the data in 3D space by employing an iterative update rule common to self-organizing maps. Despite the simplicity of the model, it captures the human pose robustly and can thus be used for tracking the major body parts, such as arms, hands, and head. The accuracy of the tracking is around 5---6 cm root mean square (RMS) for the head and shoulders and around 2 cm RMS for the head. The implementation of the procedure is straightforward and real-time capable.

computer vision and pattern recognition | 2008

Scale-invariant range features for time-of-flight camera applications

Martin Haker; Martin Böhme; Thomas Martinetz; Erhardt Barth

We describe a technique for computing scale-invariant features on range maps produced by a range sensor, such as a time-of-flight camera. Scale invariance is achieved by computing the features on the reconstructed three-dimensional surface of the object. The technique is general and can be applied to a wide range of operators. Features are computed in the frequency domain; the transform from the irregularly sampled mesh to the frequency domain uses the Nonequispaced Fast Fourier Transform. We demonstrate the technique on a facial feature detection task. On a dataset containing faces at various distances from the camera, the equal error rate (EER) for the case of scale-invariant features is halved compared to features computed on the range map in the conventional way. When the scale-invariant range features are combined with intensity features, the error rate on the test set reduces to zero.

international symposium on signals, circuits and systems | 2009

Head tracking with combined face and nose detection

Martin Böhme; Martin Haker; Thomas Martinetz; Erhardt Barth

We present a facial feature detector for time-of-flight (TOF) cameras that extends previous work by combining a nose detector based on geometric features with a face detector. The goal is to prevent false detections outside the area of the face. To detect the nose in the image, we first compute the geometric features per pixel. We then augment these geometric features with two additional features: The horizontal and vertical distance to the most likely face detected by a cascade-of-boosted-ensembles face detector. We use a very simple classifier based on an axis-aligned bounding box in feature space; pixels whose feature values fall within the box are classified as nose pixels, and all other pixels are classified as “non-nose”. The extent of the bounding box is learned on a labeled training set. Despite its simiplicity, this detector already delivers satisfactory results on the geometric features alone; adding the face detector improves the equal error rate (EER) from 22.2% (without face detector) to 10.4% (with face detector). (Note when comparing with our previous results from [1] and [2] that, in contrast to this paper, the test data used there did not contain scale variations.)

GW'09 Proceedings of the 8th international conference on Gesture in Embodied Communication and Human-Computer Interaction | 2009

Deictic gestures with a time-of-flight camera

Martin Haker; Martin Böhme; Thomas Martinetz; Erhardt Barth

We present a robust detector for deictic gestures based on a time-of-flight (TOF) camera, a combined range and intensity image sensor. Pointing direction is used to determine whether the gesture is intended for the system at all and to assign different meanings to the same gesture depending on pointing direction. We use the gestures to control a slideshow presentation: Making a “thumbs-up” gesture while pointing to the left or right of the screen switches to the previous or next slide. Pointing at the screen causes a “virtual laser pointer” to appear. Since the pointing direction is estimated in 3D, the user can move freely within the field of view of the camera after the system was calibrated. The pointing direction is measured with an absolute accuracy of 0.6 degrees and a measurement noise of 0.9 degrees near the center of the screen.

Dyn3D '09 Proceedings of the DAGM 2009 Workshop on Dynamic 3D Imaging | 2009

Face Detection Using a Time-of-Flight Camera

Martin Böhme; Martin Haker; Kolja Riemer; Thomas Martinetz; Erhardt Barth

We adapt the well-known face detection algorithm of Viola and Jones to work on the range and intensity data from a time-of-flight camera. The detector trained on the combined data has a higher detection rate (95.3%) than detectors trained on either type of data alone (intensity: 93.8%, range: 91.2%). Additionally, the combined detector uses fewer image features and hence has a shorter running time (5.15 ms per frame) than the detectors trained on intensity or range individually (intensity: 10.69 ms, range: 5.51 ms).

International Journal of Intelligent Systems Technologies and Applications | 2008

A facial feature tracker for human-computer interaction based on 3D Time-Of-Flight cameras

Martin Böhme; Martin Haker; Thomas Martinetz; Erhardt Barth

We describe a facial feature tracker based on the combined range and amplitude data provided by a 3D Time-Of-Flight camera. We use this tracker to implement a head mouse, an alternative input device for people who have limited use of their hands. The facial feature tracker is based on geometric features that are related to the intrinsic dimensionality of multidimensional signals. We show how the position of the nose in the image can be determined robustly using a very simple bounding-box classifier, trained on a set of labelled sample images. Despite its simplicity, the classifier generalises well to subjects that it was not trained on. An important result is that the combination of range and amplitude data dramatically improves robustness compared to a single type of data. The tracker runs in real time at around 30 frames per second. We demonstrate its potential as an input device by using it to control Dasher, an alternative text input tool.

computer vision and pattern recognition | 2008

Shading constraint improves accuracy of time-of-flight measurements

Martin Böhme; Martin Haker; Thomas Martinetz; Erhardt Barth

We describe a technique for improving the accuracy of range maps measured by time-of-flight (TOF) cameras. The technique is based on the observation that the range map and intensity image measured by a TOF camera are not independent but are linked by the shading constraint: If the reflectance properties of the surface are known, a certain range map implies a corresponding intensity image. We impose the shading constraint using a probabilistic model of image formation and find a maximum a posteriori estimate for the true range map. We present results on both synthetic and real TOF camera images that demonstrate the robust shape estimates achieved by the algorithm.

robot soccer world cup | 2002

A Method for Incorporation of New Evidence to Improve World State Estimation

Martin Haker; André Meyer; Daniel Polani; Thomas Martinetz

We describe an approach to incorporate new evidence into an existing world model. The method, Evidence-based World state Estimation, has been used in the RoboCup soccer simulation scenario to obtain a high precision estimate for player and ball position.

international conference on artificial neural networks | 2009

Multimodal Sparse Features for Object Detection

Martin Haker; Thomas Martinetz; Erhardt Barth

In this paper the sparse coding principle is employed for the representation of multimodal image data, i.e. image intensity and range. We estimate an image basis for frontal face images taken with a Time-of-Flight (TOF) camera to obtain a sparse representation of facial features, such as the nose. These features are then evaluated in an object detection scenario where we estimate the position of the nose by template matching and a subsequent application of appropriate thresholds that are estimated from a labeled training set. The main contribution of this work is to show that the templates can be learned simultaneously on both intensity and range data based on the sparse coding principle, and that these multimodal templates significantly outperform templates generated by averaging over a set of aligned image patches containing the facial feature of interest as well as multimodal templates computed via Principal Component Analysis (PCA). The system achieves a detection rate of 96.4% on average with a false positive rate of 3.7%.

Explore More