Gabriele Fanelli | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gabriele Fanelli is active.

Explore More

Publication

Featured researches published by Gabriele Fanelli.

computer vision and pattern recognition | 2012

Real-time facial feature detection using conditional regression forests

Matthias Dantone; Juergen Gall; Gabriele Fanelli; Luc Van Gool

Although facial feature detection from 2D images is a well-studied field, there is a lack of real-time methods that estimate feature points even on low quality images. Here we propose conditional regression forest for this task. While regression forest learn the relations between facial image patches and the location of feature points from the entire set of faces, conditional regression forest learn the relations conditional to global face properties. In our experiments, we use the head pose as a global property and demonstrate that conditional regression forests outperform regression forests for facial feature detection. We have evaluated the method on the challenging Labeled Faces in the Wild [20] database where close-to-human accuracy is achieved while processing images in real-time.

computer vision and pattern recognition | 2011

Real time head pose estimation with random regression forests

Gabriele Fanelli; Juergen Gall; Luc Van Gool

Fast and reliable algorithms for estimating the head pose are essential for many applications and higher-level face analysis tasks. We address the problem of head pose estimation from depth data, which can be captured using the ever more affordable 3D sensing technologies available today. To achieve robustness, we formulate pose estimation as a regression problem. While detecting specific face parts like the nose is sensitive to occlusions, learning the regression on rather generic surface patches requires enormous amount of training data in order to achieve accurate estimates. We propose to use random regression forests for the task at hand, given their capability to handle large training datasets. Moreover, we synthesize a great amount of annotated training data using a statistical model of the human face. In our experiments, we show that our approach can handle real data presenting large pose changes, partial occlusions, and facial expressions, even though it is trained only on synthetic neutral face data. We have thoroughly evaluated our system on a publicly available database on which we achieve state-of-the-art performance without having to resort to the graphics card.

International Journal of Computer Vision | 2013

Random Forests for Real Time 3D Face Analysis

Gabriele Fanelli; Matthias Dantone; Juergen Gall; Andrea Fossati; Luc Van Gool

We present a random forest-based framework for real time head pose estimation from depth images and extend it to localize a set of facial features in 3D. Our algorithm takes a voting approach, where each patch extracted from the depth image can directly cast a vote for the head pose or each of the facial features. Our system proves capable of handling large rotations, partial occlusions, and the noisy depth data acquired using commercial sensors. Moreover, the algorithm works on each frame independently and achieves real time performance without resorting to parallel computations on a GPU. We present extensive experiments on publicly available, challenging datasets and present a new annotated head pose database recorded using a Microsoft Kinect.

international conference on pattern recognition | 2011

Real time head pose estimation from consumer depth cameras

Gabriele Fanelli; Thibaut Weise; Juergen Gall; Luc Van Gool

We present a system for estimating location and orientation of a persons head, from depth data acquired by a low quality device. Our approach is based on discriminative random regression forests: ensembles of random trees trained by splitting each node so as to simultaneously reduce the entropy of the class labels distribution and the variance of the head position and orientation. We evaluate three different approaches to jointly take classification and regression performance into account during training. For evaluation, we acquired a new dataset and propose a method for its automatic annotation.

british machine vision conference | 2011

Does human action recognition benefit from pose estimation

Angela Yao; Juergen Gall; Gabriele Fanelli; Luc Van Gool

Early works on human action recognition focused on tracking and classifying articulated body motions. Such methods required accurate localisation of body parts, which is a difficult task, particularly under realistic imaging conditions. As such, recent trends have shifted towards the use of more abstract, low-level appearance features such as spatio-temporal interest points. Motivated by the recent progress in pose estimation, we feel that pose-based action recognition systems warrant a second look. In this paper, we address the question of whether pose estimation is useful for action recognition or if it is better to train a classifier only on low-level appearance features drawn from video data. We compare pose-based, appearance-based and combined pose and appearance features for action recognition in a home-monitoring scenario. Our experiments show that posebased features outperform low-level appearance features, even when heavily corrupted by noise, suggesting that pose estimation is beneficial for the action recognition task.

IEEE Transactions on Multimedia | 2010

A 3-D Audio-Visual Corpus of Affective Communication

Gabriele Fanelli; Jürgen Gall; Harald Romsdorfer; Thibaut Weise; L. Van Gool

Communication between humans deeply relies on the capability of expressing and recognizing feelings. For this reason, research on human-machine interaction needs to focus on the recognition and simulation of emotional states, prerequisite of which is the collection of affective corpora. Currently available datasets still represent a bottleneck for the difficulties arising during the acquisition and labeling of affective data. In this work, we present a new audio-visual corpus for possibly the two most important modalities used by humans to communicate their emotional states, namely speech and facial expression in the form of dense dynamic 3-D face geometries. We acquire high-quality data by working in a controlled environment and resort to video clips to induce affective states. The annotation of the speech signal includes: transcription of the corpus text into the phonological representation, accurate phone segmentation, fundamental frequency extraction, and signal intensity estimation of the speech signals. We employ a real-time 3-D scanner to acquire dense dynamic facial geometries and track the faces throughout the sequences, achieving full spatial and temporal correspondences. The corpus is a valuable tool for applications like affective visual speech synthesis or view-independent facial expression recognition.

ieee international conference on automatic face gesture recognition | 2013

Real time 3D face alignment with Random Forests-based Active Appearance Models

Gabriele Fanelli; Matthias Dantone; Luc Van Gool

Many desirable applications dealing with automatic face analysis rely on robust facial feature localization. While extensive research has been carried out on standard 2D imagery, recent technological advances made the acquisition of 3D data both accurate and affordable, opening new ways to more accurate and robust algorithms. We present a model-based approach to real time face alignment, fitting a 3D model to depth and intensity images of unseen expressive faces. We use random regression forests to drive the fitting in an Active Appearance Model framework. We thoroughly evaluated the proposed approach on publicly available datasets and show how adding the depth channel boosts the robustness and accuracy of the algorithm.

international symposium on communications, control and signal processing | 2012

Real time 3D head pose estimation: Recent achievements and future challenges

Gabriele Fanelli; Juergen Gall; Luc Van Gool

Most automatic face recognition algorithms try to normalize facial images in order to remove variations caused by anything but the identity of the person. Lighting conditions being less problematic since the introduction of reliable and affordable depth sensors, head pose is the other great source of un-desired variations in facial images. In this paper, we describe recent state-of-the-art methods for real time head pose estimation from depth data, present available databases, and discuss open problems to be addressed by future research.

british machine vision conference | 2009

Hough transform-based mouth localization for audio-visual speech recognition

Gabriele Fanelli; Jürgen Gall; Luc Van Gool

We present a novel method for mouth localization in the context of multimodal speech recognition where audio and visual cues are fused to improve the speech recognition accuracy. While facial feature points like mouth corners or lip contours are commonly used to estimate at least scale, position, and orientation of the mouth, we propose a Hough transform-based method. Instead of relying on a predefined sparse subset of mouth features, it casts probabilistic votes for the mouth center from several patches in the neighborhood and accumulates the votes in a Hough image. This makes the localization more robust as it does not rely on the detection of a single feature. In addition, we exploit the different shape properties of eyes and mouth in order to localize the mouth more efficiently. Using the rotation invariant representation of the iris, scale and orientation can be efficiently inferred from the localized eye positions. The superior accuracy of our method and quantitative improvements for audio-visual speech recognition over monomodal approaches are demonstrated on two datasets.

european conference on computer vision | 2010