Jürgen Gall
ETH Zurich
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jürgen Gall.
computer vision and pattern recognition | 2010
Angela Yao; Jürgen Gall; Luc Van Gool
We present a method to classify and localize human actions in video using a Hough transform voting framework. Random trees are trained to learn a mapping between densely-sampled feature patches and their corresponding votes in a spatio-temporal-action Hough space. The leaves of the trees form a discriminative multi-class codebook that share features between the action classes and vote for action centers in a probabilistic manner. Using low-level features such as gradients and optical flow, we demonstrate that Hough-voting can achieve state-of-the-art performance on several datasets covering a wide range of action-recognition scenarios.
european conference on computer vision | 2012
Luca Ballan; Aparna Taneja; Jürgen Gall; Luc Van Gool; Marc Pollefeys
Capturing the motion of two hands interacting with an object is a very challenging task due to the large number of degrees of freedom, self-occlusions, and similarity between the fingers, even in the case of multiple cameras observing the scene. In this paper we propose to use discriminatively learned salient points on the fingers and to estimate the finger-salient point associations simultaneously with the estimation of the hand pose. We introduce a differentiable objective function that also takes edges, optical flow and collisions into account. Our qualitative and quantitative evaluations show that the proposed approach achieves very accurate results for several challenging sequences containing hands and objects in action.
IEEE Transactions on Multimedia | 2010
Gabriele Fanelli; Jürgen Gall; Harald Romsdorfer; Thibaut Weise; L. Van Gool
Communication between humans deeply relies on the capability of expressing and recognizing feelings. For this reason, research on human-machine interaction needs to focus on the recognition and simulation of emotional states, prerequisite of which is the collection of affective corpora. Currently available datasets still represent a bottleneck for the difficulties arising during the acquisition and labeling of affective data. In this work, we present a new audio-visual corpus for possibly the two most important modalities used by humans to communicate their emotional states, namely speech and facial expression in the form of dense dynamic 3-D face geometries. We acquire high-quality data by working in a controlled environment and resort to video clips to induce affective states. The annotation of the speech signal includes: transcription of the corpus text into the phonological representation, accurate phone segmentation, fundamental frequency extraction, and signal intensity estimation of the speech signals. We employ a real-time 3-D scanner to acquire dense dynamic facial geometries and track the faces throughout the sequences, achieving full spatial and temporal correspondences. The corpus is a valuable tool for applications like affective visual speech synthesis or view-independent facial expression recognition.
computer vision and pattern recognition | 2010
Henning Hamer; Jürgen Gall; Thibaut Weise; Luc Van Gool
In this paper, we propose a prior for hand pose estimation that integrates the direct relation between a manipulating hand and a 3d object. This is of particular interest for a variety of applications since many tasks performed by humans require hand-object interaction. Inspired by the ability of humans to learn the handling of an object from a single example, our focus lies on very sparse training data. We express estimated hand poses in local object coordinates and extract for each individual hand segment, the relative position and orientation as well as contact points on the object. The prior is then modeled as a spatial distribution conditioned to the object. Given a new object of the same object class and new hand dimensions, we can transfer the prior by a procedure involving a geometric warp. In our experiments, we demonstrate that the prior may be used to improve the robustness of a 3d hand tracker and to synthesize a new hand grasping a new object. For this, we integrate the prior into a unified belief propagation framework for tracking and synthesis.
british machine vision conference | 2009
Gabriele Fanelli; Jürgen Gall; Luc Van Gool
We present a novel method for mouth localization in the context of multimodal speech recognition where audio and visual cues are fused to improve the speech recognition accuracy. While facial feature points like mouth corners or lip contours are commonly used to estimate at least scale, position, and orientation of the mouth, we propose a Hough transform-based method. Instead of relying on a predefined sparse subset of mouth features, it casts probabilistic votes for the mouth center from several patches in the neighborhood and accumulates the votes in a Hough image. This makes the localization more robust as it does not rely on the detection of a single feature. In addition, we exploit the different shape properties of eyes and mouth in order to localize the mouth more efficiently. Using the rotation invariant representation of the iris, scale and orientation can be efficiently inferred from the localized eye positions. The superior accuracy of our method and quantitative improvements for audio-visual speech recognition over monomodal approaches are demonstrated on two datasets.
workshop on applications of computer vision | 2009
Mohammed Shaheen; Jürgen Gall; Robert Strzodka; Luc Van Gool; Hans-Peter Seidel
This work addresses the problem of tracking humans with skeleton-based shape models where video footage is acquired by multiple cameras. Since the shape deformations are parameterized by the skeleton, the position, orientation, and configuration of the human skeleton are estimated such that the deformed shape model is best explained by the image data. To solve this problem, several algorithms have been proposed over the last years. The approaches usually rely on filtering, local optimization, or global optimization. The global optimization algorithms can be further divided into single hypothesis (SHO) and multiple hypothesis optimization (MHO). We briefly compare the underlying mathematical models and evaluate the performance of one representative algorithm for each class. Furthermore, we compare several likelihoods and parameter settings with respect to accuracy and computation cost. A thorough evaluation is performed on two sequences with uncontrolled lighting conditions and non-static background. In addition, we demonstrate the impact of the likelihood on the HumanEva benchmark. Our results provide a guidance on algorithm design for different applications related to human motion capture.
Untitled Event | 2009
Jürgen Gall; Victor S. Lempitsky
Untitled Event | 2009
Jürgen Gall; Carsten Stoll; Edilson de Aguiar; Christian Theobalt; Bodo Rosenhahn; Hans-Peter Seidel
Untitled Event | 2006
Jürgen Gall; Bodo Rosenhahn; Thomas Brox; Hans-Peter Seidel; George Bebis; Richard Boyle; Bahram Parvin; Darko Koracin; Paolo Remagnino; Ara V. Nefian; Gopi Meenakshisundaram; Valerio Pascucci; Jiri Zara; Jose Molineros; Holger Theisel; Tom Malzbender
international conference on computer vision | 2012
Stefano Pellegrini; Jürgen Gall; Leonid Sigal; Luc Van Gool