Juergen Gall | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Juergen Gall is active.

Explore More

Publication

Featured researches published by Juergen Gall.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2011

Hough Forests for Object Detection, Tracking, and Action Recognition

Juergen Gall; Angela Yao; Nima Razavi; L. Van Gool; Victor S. Lempitsky

The paper introduces Hough forests, which are random forests adapted to perform a generalized Hough transform in an efficient way. Compared to previous Hough-based systems such as implicit shape models, Hough forests improve the performance of the generalized Hough transform for object detection on a categorical level. At the same time, their flexibility permits extensions of the Hough transform to new domains such as object tracking and action recognition. Hough forests can be regarded as task-adapted codebooks of local appearance that allow fast supervised training and fast matching at test time. They achieve high detection accuracy since the entries of such codebooks are optimized to cast Hough votes with small variance and since their efficiency permits dense sampling of local image patches or video cuboids during detection. The efficacy of Hough forests for a set of computer vision tasks is validated through experiments on a large set of publicly available benchmark data sets and comparisons with the state-of-the-art.

computer vision and pattern recognition | 2009

Class-specific Hough forests for object detection

Juergen Gall; Victor S. Lempitsky

We present a method for the detection of instances of an object class, such as cars or pedestrians, in natural images. Similarly to some previous works, this is accomplished via generalized Hough transform, where the detections of individual object parts cast probabilistic votes for possible locations of the centroid of the whole object; the detection hypotheses then correspond to the maxima of the Hough image that accumulates the votes from all parts. However, whereas the previous methods detect object parts using generative codebooks of part appearances, we take a more discriminative approach to object part detection. Towards this end, we train a class-specific Hough forest, which is a random forest that directly maps the image patch appearance to the probabilistic vote about the possible location of the object centroid. We demonstrate that Hough forests improve the results of the Hough-transform object detection significantly and achieve state-of-the-art performance for several classes and datasets.

computer vision and pattern recognition | 2012

Real-time facial feature detection using conditional regression forests

Matthias Dantone; Juergen Gall; Gabriele Fanelli; Luc Van Gool

Although facial feature detection from 2D images is a well-studied field, there is a lack of real-time methods that estimate feature points even on low quality images. Here we propose conditional regression forest for this task. While regression forest learn the relations between facial image patches and the location of feature points from the entire set of faces, conditional regression forest learn the relations conditional to global face properties. In our experiments, we use the head pose as a global property and demonstrate that conditional regression forests outperform regression forests for facial feature detection. We have evaluated the method on the challenging Labeled Faces in the Wild [20] database where close-to-human accuracy is achieved while processing images in real-time.

computer vision and pattern recognition | 2011

Real time head pose estimation with random regression forests

Gabriele Fanelli; Juergen Gall; Luc Van Gool

Fast and reliable algorithms for estimating the head pose are essential for many applications and higher-level face analysis tasks. We address the problem of head pose estimation from depth data, which can be captured using the ever more affordable 3D sensing technologies available today. To achieve robustness, we formulate pose estimation as a regression problem. While detecting specific face parts like the nose is sensitive to occlusions, learning the regression on rather generic surface patches requires enormous amount of training data in order to achieve accurate estimates. We propose to use random regression forests for the task at hand, given their capability to handle large training datasets. Moreover, we synthesize a great amount of annotated training data using a statistical model of the human face. In our experiments, we show that our approach can handle real data presenting large pose changes, partial occlusions, and facial expressions, even though it is trained only on synthetic neutral face data. We have thoroughly evaluated our system on a publicly available database on which we achieve state-of-the-art performance without having to resort to the graphics card.

computer vision and pattern recognition | 2009

Motion capture using joint skeleton tracking and surface estimation

Juergen Gall; Carsten Stoll; Edilson de Aguiar; Christian Theobalt; Bodo Rosenhahn; Hans-Peter Seidel

This paper proposes a method for capturing the performance of a human or an animal from a multi-view video sequence. Given an articulated template model and silhouettes from a multi-view image sequence, our approach recovers not only the movement of the skeleton, but also the possibly non-rigid temporal deformation of the 3D surface. While large scale deformations or fast movements are captured by the skeleton pose and approximate surface skinning, true small scale deformations or non-rigid garment motion are captured by fitting the surface to the silhouette. We further propose a novel optimization scheme for skeleton-based pose estimation that exploits the skeletons tree structure to split the optimization problem into a local one and a lower dimensional global one. We show on various sequences that our approach can capture the 3D motion of animals and humans accurately even in the case of rapid movements and wide apparel like skirts.

International Journal of Computer Vision | 2013

Random Forests for Real Time 3D Face Analysis

Gabriele Fanelli; Matthias Dantone; Juergen Gall; Andrea Fossati; Luc Van Gool

We present a random forest-based framework for real time head pose estimation from depth images and extend it to localize a set of facial features in 3D. Our algorithm takes a voting approach, where each patch extracted from the depth image can directly cast a vote for the head pose or each of the facial features. Our system proves capable of handling large rotations, partial occlusions, and the noisy depth data acquired using commercial sensors. Moreover, the algorithm works on each frame independently and achieves real time performance without resorting to parallel computations on a GPU. We present extensive experiments on publicly available, challenging datasets and present a new annotated head pose database recorded using a Microsoft Kinect.

international conference on computer vision | 2013

Towards Understanding Action Recognition

Hueihan Jhuang; Juergen Gall; Silvia Zuffi; Cordelia Schmid; Michael J. Black

Although action recognition in videos is widely studied, current methods often fail on real-world datasets. Many recent approaches improve accuracy and robustness to cope with challenging video sequences, but it is often unclear what affects the results most. This paper attempts to provide insights based on a systematic performance evaluation using thoroughly-annotated data of human actions. We annotate human Joints for the HMDB dataset (J-HMDB). This annotation can be used to derive ground truth optical flow and segmentation. We evaluate current methods using this dataset and systematically replace the output of various algorithms with ground truth. This enables us to discover what is important - for example, should we work on improving flow algorithms, estimating human bounding boxes, or enabling pose estimation? In summary, we find that high-level pose features greatly outperform low/mid level features, in particular, pose over time is critical, but current pose estimation algorithms are not yet reliable enough to provide this information. We also find that the accuracy of a top-performing action recognition framework can be greatly increased by refining the underlying low/mid level features, this suggests it is important to improve optical flow and human detection algorithms. Our analysis and J-HMDB dataset should facilitate a deeper understanding of action recognition algorithms.

international conference on pattern recognition | 2011

Real time head pose estimation from consumer depth cameras

Gabriele Fanelli; Thibaut Weise; Juergen Gall; Luc Van Gool

We present a system for estimating location and orientation of a persons head, from depth data acquired by a low quality device. Our approach is based on discriminative random regression forests: ensembles of random trees trained by splitting each node so as to simultaneously reduce the entropy of the class labels distribution and the variance of the head position and orientation. We evaluate three different approaches to jointly take classification and regression performance into account during training. For evaluation, we acquired a new dataset and propose a method for its automatic annotation.

International Journal of Computer Vision | 2010

Optimization and Filtering for Human Motion Capture

Juergen Gall; Bodo Rosenhahn; Thomas Brox; Hans-Peter Seidel

Local optimization and filtering have been widely applied to model-based 3D human motion capture. Global stochastic optimization has recently been proposed as promising alternative solution for tracking and initialization. In order to benefit from optimization and filtering, we introduce a multi-layer framework that combines stochastic optimization, filtering, and local optimization. While the first layer relies on interacting simulated annealing and some weak prior information on physical constraints, the second layer refines the estimates by filtering and local optimization such that the accuracy is increased and ambiguities are resolved over time without imposing restrictions on the dynamics. In our experimental evaluation, we demonstrate the significant improvements of the multi-layer framework and provide quantitative 3D pose tracking results for the complete HumanEva-II dataset. The paper further comprises a comparison of global stochastic optimization with particle filtering, annealed particle filtering, and local optimization.

computer vision and pattern recognition | 2011

What makes a chair a chair

Helmut Grabner; Juergen Gall; Luc Van Gool

Many object classes are primarily defined by their functions. However, this fact has been left largely unexploited by visual object categorization or detection systems. We propose a method to learn an affordance detector. It identifies locations in the 3d space which “support” the particular function. Our novel approach “imagines” an actor performing an action typical for the target object class, instead of relying purely on the visual object appearance. So, function is handled as a cue complementary to appearance, rather than being a consideration after appearance-based detection. Experimental results are given for the functional category “sitting”. Such affordance is tested on a 3d representation of the scene, as can be realistically obtained through SfM or depth cameras. In contrast to appearance-based object detectors, affordance detection requires only very few training examples and generalizes very well to other sittable objects like benches or sofas when trained on a few chairs.

Explore More