Grégory Rogez | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Grégory Rogez is active.

Explore More

Publication

Featured researches published by Grégory Rogez.

computer vision and pattern recognition | 2008

Randomized trees for human pose detection

Grégory Rogez; Jonathan Rihan; Srikumar Ramalingam; Carlos Orrite; Philip H. S. Torr

This paper addresses human pose recognition from video sequences by formulating it as a classification problem. Unlike much previous work we do not make any assumptions on the availability of clean segmentation. The first step of this work consists in a novel method of aligning the training images using 3D Mocap data. Next we define classes by discretizing a 2D manifold whose two dimensions are camera viewpoint and actions. Our main contribution is a pose detection algorithm based on random forests. A bottom-up approach is followed to build a decision tree by recursively clustering and merging the classes at each level. For each node of the decision tree we build a list of potentially discriminative features using the alignment of training images; in this paper we consider Histograms of Orientated Gradient (HOG). We finally grow an ensemble of trees by randomly sampling one of the selected HOG blocks at each node. Our proposed approach gives promising results with both fixed and moving cameras.

international conference on computer vision | 2015

Depth-Based Hand Pose Estimation: Data, Methods, and Challenges

James Steven Supancic; Grégory Rogez; Yi Yang; Jamie Shotton; Deva Ramanan

Hand pose estimation has matured rapidly in recent years. The introduction of commodity depth sensors and a multitude of practical applications have spurred new advances. We provide an extensive analysis of the state-of-the-art, focusing on hand pose estimation from a single depth frame. To do so, we have implemented a considerable number of systems, and will release all software and evaluation code. We summarize important conclusions here: (1) Pose estimation appears roughly solved for scenes with isolated hands. However, methods still struggle to analyze cluttered scenes where hands may be interacting with nearby objects and surfaces. To spur further progress we introduce a challenging new dataset with diverse, cluttered scenes. (2) Many methods evaluate themselves with disparate criteria, making comparisons difficult. We define a consistent evaluation criteria, rigorously motivated by human experiments. (3) We introduce a simple nearest-neighbor baseline that outperforms most existing systems. This implies that most systems do not generalize beyond their training sets. This also reinforces the under-appreciated point that training data is as important as the model itself. We conclude with directions for future progress.

international conference on machine learning | 2008

Ambiguity Modeling in Latent Spaces

Carl Henrik Ek; Jonathan Rihan; Philip H. S. Torr; Grégory Rogez; Neil D. Lawrence

We are interested in the situation where we have two or more representations of an underlying phenomenon. In particular we are interested in the scenario where the representation are complementary. This implies that a single individual representation is not sufficient to fully discriminate a specific instance of the underlying phenomenon, it also means that each representation is an ambiguous representation of the other complementary spaces. In this paper we present a latent variable model capable of consolidating multiple complementary representations. Our method extends canonical correlation analysis by introducing additional latent spaces that are specific to the different representations, thereby explaining the full variance of the observations. These additional spaces, explaining representation specific variance, separately model the variance in a representation ambiguous to the other. We develop a spectral algorithm for fast computation of the embeddings and a probabilistic model (based on Gaussian processes) for validation and inference. The proposed model has several potential application areas, we demonstrate its use for multi-modal regression on a benchmark human pose estimation data set.

european conference on computer vision | 2014

3D Hand Pose Detection in Egocentric RGB-D Images

Grégory Rogez; Maryam Khademi; James Steven Supancic; J. M. M. Montiel; Deva Ramanan

We focus on the task of hand pose estimation from egocentric viewpoints. For this problem specification, we show that depth sensors are particularly informative for extracting near-field interactions of the camera wearer with his/her environment. Despite the recent advances in full-body pose estimation using Kinect-like sensors, reliable monocular hand pose estimation in RGB-D images is still an unsolved problem. The problem is exacerbated when considering a wearable sensor and a first-person camera viewpoint: the occlusions inherent to the particular camera view and the limitations in terms of field of view make the problem even more difficult. We propose to use task and viewpoint specific synthetic training exemplars in a discriminative detection framework. We also exploit the depth features for a sparser and faster detection. We evaluate our approach on a real-world annotated dataset and propose a novel annotation technique for accurate 3D hand labelling even in case of partial occlusions.

computer vision and pattern recognition | 2015

First-person pose recognition using egocentric workspaces

Grégory Rogez; James Steven Supancic; Deva Ramanan

We tackle the problem of estimating the 3D pose of an individuals upper limbs (arms+hands) from a chest mounted depth-camera. Importantly, we consider pose estimation during everyday interactions with objects. Past work shows that strong pose+viewpoint priors and depth-based features are crucial for robust performance. In egocentric views, hands and arms are observable within a well defined volume in front of the camera. We call this volume an egocentric workspace. A notable property is that hand appearance correlates with workspace location. To exploit this correlation, we classify arm+hand configurations in a global egocentric coordinate frame, rather than a local scanning window. This greatly simplify the architecture and improves performance. We propose an efficient pipeline which 1) generates synthetic workspace exemplars for training using a virtual chest-mounted camera whose intrinsic parameters match our physical camera, 2) computes perspective-aware depth features on this entire volume and 3) recognizes discrete arm+hand pose classes through a sparse multi-class SVM. We achieve state-of-the-art hand pose recognition performance from egocentric RGB-D images in real-time.

Pattern Recognition | 2008

A spatio-temporal 2D-models framework for human pose recovery in monocular sequences

Grégory Rogez; Jesús Martínez-del-Rincón

This paper addresses the pose recovery problem of a particular articulated object: the human body. In this model-based approach, the 2D-shape is associated to the corresponding stick figure allowing the joint segmentation and pose recovery of the subject observed in the scene. The main disadvantage of 2D-models is their restriction to the viewpoint. To cope with this limitation, local spatio-temporal 2D-models corresponding to many views of the same sequences are trained, concatenated and sorted in a global framework. Temporal and spatial constraints are then considered to build the probabilistic transition matrix (PTM) that gives a frame to frame estimation of the most probable local models to use during the fitting procedure, thus limiting the feature space. This approach takes advantage of 3D information avoiding the use of a complex 3D human model. The experiments carried out on both indoor and outdoor sequences have demonstrated the ability of this approach to adequately segment pedestrians and estimate their poses independently of the direction of motion during the sequence.

international conference on pattern recognition | 2004

2D silhouette and 3D skeletal models for human detection and tracking

J.M. del Rincon; J.E. Herrero-Jaraba; Grégory Rogez

In This work we propose a statistical model for detection and tracking of human silhouette and the corresponding 3D skeletal structure in gait sequences. We follow a point distribution model (PDM) approach using a principal component analysis (PCA). The problem of non-linear PCA is partially resolved by applying a different PDM depending of pose estimation; frontal, lateral and diagonal, estimated by Fishers linear discriminant. Additionally, the fitting is carried out by selecting the closest allowable shape from the training set by means of a nearest neighbor classifier. To improve the performance of the model we develop a human gait analysis to take into account temporal dynamic to track the human body. The incorporation of temporal constraints on the model helps increase the reliability and robustness.

international conference on computer vision | 2015

Understanding Everyday Hands in Action from RGB-D Images

Grégory Rogez; James Steven Supancic; Deva Ramanan

We analyze functional manipulations of handheld objects, formalizing the problem as one of fine-grained grasp classification. To do so, we make use of a recently developed fine-grained taxonomy of human-object grasps. We introduce a large dataset of 12000 RGB-D images covering 71 everyday grasps in natural interactions. Our dataset is different from past work (typically addressed from a robotics perspective) in terms of its scale, diversity, and combination of RGB and depth data. From a computer-vision perspective, our dataset allows for exploration of contact and force prediction (crucial concepts in functional grasp analysis) from perceptual cues. We present extensive experimental results with state-of-the-art baselines, illustrating the role of segmentation, object context, and 3D-understanding in functional grasp analysis. We demonstrate a near 2X improvement over prior work and a naive deep baseline, while pointing out important directions for improvement.

International Journal of Computer Vision | 2018

Depth-Based Hand Pose Estimation: Methods, Data, and Challenges

James Steven Supancic; Grégory Rogez; Yi Yang; Jamie Shotton; Deva Ramanan

Hand pose estimation has matured rapidly in recent years. The introduction of commodity depth sensors and a multitude of practical applications have spurred new advances. We provide an extensive analysis of the state-of-the-art, focusing on hand pose estimation from a single depth frame. To do so, we have implemented a considerable number of systems, and have released software and evaluation code. We summarize important conclusions here: (1) Coarse pose estimation appears viable for scenes with isolated hands. However, high precision pose estimation [required for immersive virtual reality and cluttered scenes (where hands may be interacting with nearby objects and surfaces) remain a challenge. To spur further progress we introduce a challenging new dataset with diverse, cluttered scenes. (2) Many methods evaluate themselves with disparate criteria, making comparisons difficult. We define a consistent evaluation criteria, rigorously motivated by human experiments. (3) We introduce a simple nearest-neighbor baseline that outperforms most existing systems. This implies that most systems do not generalize beyond their training sets. This also reinforces the under-appreciated point that training data is as important as the model itself. We conclude with directions for future progress.

british machine vision conference | 2006

Viewpoint independent human motion analysis in man-made environments

Grégory Rogez; José Jesús Guerrero; Jesús I. Martínez

This work addresses the problem of human motion analysis in video sequences of a scene observed by a single fixed camera with high perspective effect. The goal of this work is to make a 2D-Model (made of Shape and Stick figure) viewpoint-insensitive and preprocess the input image for removing the perspective effect. We focus our methodology on using the 3D principal directions of man-made environments and also the direction of motion to transform both 2D-Model and input images to a common frontal view (parallel or orthogonal to the direction of motion) before the fitting process. The inverse transformation is then performed on the resulting human features obtaining a segmented silhouette and a pose estimation in the original input image. Preliminary results are very promising since the proposed algorithm is able to locate head and feet with a better precision than previous one.

Explore More