Andrea Fossati
ETH Zurich
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Andrea Fossati.
International Journal of Computer Vision | 2013
Gabriele Fanelli; Matthias Dantone; Juergen Gall; Andrea Fossati; Luc Van Gool
We present a random forest-based framework for real time head pose estimation from depth images and extend it to localize a set of facial features in 3D. Our algorithm takes a voting approach, where each patch extracted from the depth image can directly cast a vote for the head pose or each of the facial features. Our system proves capable of handling large rotations, partial occlusions, and the noisy depth data acquired using commercial sensors. Moreover, the algorithm works on each frame independently and achieves real time performance without resorting to parallel computations on a GPU. We present extensive experiments on publicly available, challenging datasets and present a new annotated head pose database recorded using a Microsoft Kinect.
british machine vision conference | 2011
Marco Cristani; Loris Bazzani; Giulia Paggetti; Andrea Fossati; Diego Tosato; Alessio Del Bue; Gloria Menegaz; Vittorio Murino
We present a novel approach for detecting social interactions in a crowded scene by employing solely visual cues. The detection of social interactions in unconstrained scenarios is a valuable and important task, especially for surveillance purposes. Our proposal is inspired by the social signaling literature, and in particular it considers the sociological notion of F-formation. An F-formation is a set of possible configurations in space that people may assume while participating in a social interaction. Our system takes as input the positions of the people in a scene and their (head) orientations; then, employing a voting strategy based on the Hough transform, it recognizes F-formations and the individuals associated with them. Experiments on simulations and real data promote our idea.
computer vision and pattern recognition | 2011
Juergen Gall; Andrea Fossati; Luc Van Gool
Unsupervised categorization of objects is a fundamental problem in computer vision. While appearance-based methods have become popular recently, other important cues like functionality are largely neglected. Motivated by psychological studies giving evidence that human demonstration has a facilitative effect on categorization in infancy, we propose an approach for object categorization from depth video streams. To this end, we have developed a method for capturing human motion in real-time. The captured data is then used to temporally segment the depth streams into actions. The set of segmented actions are then categorized in an un-supervised manner, through a novel descriptor for motion capture data that is robust to subject variations. Furthermore, we automatically localize the object that is manipulated within a video segment, and categorize it using the corresponding action. For evaluation, we have recorded a dataset that comprises depth data with registered video sequences for 6 subjects, 13 action classes, and 174 object manipulations.
Archive | 2013
Andrea Fossati; Juergen Gall; Helmut Grabner; Xiaofeng Ren; Kurt Konolige
We analyze Kinect as a 3D measuring device, experimentally investigate depth measurement resolution and error properties, and make a quantitative comparison of Kinect accuracy with stereo reconstruction from SLR cameras and a 3DTOF camera. We propose a Kinect geometrical model and its calibration procedure providing an accurate calibration of Kinect 3D measurement and Kinect cameras. We compare our Kinect calibration procedure with its alternatives available on Internet, and integrate it into an SfM pipeline where 3D measurements from a moving Kinect are transformed into a common coordinate system, by computing relative poses from matches in its color camera.
machine vision applications | 2011
Andrea Fossati; Patrick Schönmann; Pascal Fua
Detecting car taillights at night is a task which can nowadays be accomplished very fast on cheap hardware. We rely on such detections to build a vision-based system that, coupling them in a rule-based fashion, is able to detect and track vehicles. This allows the generation of an interface that informs a driver of the relative distance and velocity of other vehicles in real time and triggers a warning when a potentially dangerous situation arises. We demonstrate the system using sequences shot using a camera mounted behind a car’s windshield.
international conference on robotics and automation | 2014
Matteo Munaro; Alberto Basso; Andrea Fossati; Luc Van Gool; Emanuele Menegatti
In this work, we describe a novel method for creating 3D models of persons freely moving in front of a consumer depth sensor and we show how they can be used for long-term person re-identification. For overcoming the problem of the different poses a person can assume, we exploit the information provided by skeletal tracking algorithms for warping every point cloud frame to a standard pose in real time. Then, the warped point clouds are merged together to compose the model. Re-identification is performed by matching body shapes in terms of whole point clouds warped to a standard pose with the described method. We compare this technique with a classification method based on a descriptor of skeleton features and with a mixed approach which exploits both skeleton and shape features. We report experiments on two datasets we acquired for RGB-D re-identification which use different skeletal tracking algorithms and which are made publicly available to foster research in this new research branch.
Person Re-Identification | 2014
Matteo Munaro; Andrea Fossati; Alberto Basso; Emanuele Menegatti; Luc Van Gool
In this chapter, we propose a comparison between two techniques for one-shot person re-identification from soft biometric cues. One is based upon a descriptor composed of features provided by a skeleton estimation algorithm; the other compares body shapes in terms of whole point clouds. This second approach relies on a novel technique we propose to warp the subject’s point cloud to a standard pose, which allows to disregard the problem of the different poses a person can assume. This technique is also used for composing 3D models which are then used at testing time for matching unseen point clouds. We test the proposed approaches on an existing RGB-D re-identification dataset and on the newly built BIWI RGBD-ID dataset. This dataset provides sequences of RGB, depth, and skeleton data for 50 people in two different scenarios and it has been made publicly available to foster advancement in this new research branch.
computer vision and pattern recognition | 2009
Andrea Fossati; Mathieu Salzmann; Pascal Fua
The articulated body models used to represent human motion typically have many degrees of freedom, usually expressed as joint angles that are highly correlated. The true range of motion can therefore be represented by latent variables that span a low-dimensional space. This has often been used to make motion tracking easier. However, learning the latent space in a problem- independent way makes it non trivial to initialize the tracking process by picking appropriate initial values for the latent variables, and thus for the pose. In this paper, we show that by directly using observable quantities as our latent variables, we eliminate this problem and achieve full automation given only modest amounts of training data. More specifically, we exploit the fact that the trajectory of a persons feet or hands strongly constrains body pose in motions such as skating, skiing, or golfing. These trajectories are easy to compute and to parameterize using a few variables. We treat these as our latent variables and learn a mapping between them and sequences of body poses. In this manner, by simply tracking the feet or the hands, we can reliably guess initial poses over whole sequences and, then, refine them.
computer vision and pattern recognition | 2008
Andrea Fossati; Elise Arnaud; Radu Horaud; Pascal Fua
A generalized expectation maximization (GEM) algorithm is used to retrieve the pose of a person from a monocular video sequence shot with a moving camera. After embedding the set of possible poses in a low dimensional space using principal component analysis, the configuration that gives the best match to the input image is held as estimate for the current frame. This match is computed iterating GEM to assign edge pixels to the correct body part and to find the body pose that maximizes the likelihood of the assignments.
international conference on 3d imaging, modeling, processing, visualization & transmission | 2012
Andrea Fossati; Helmut Grabner; Luc Van Gool
Reliable 3D object tracking can provide strong cues for scene understanding. In this paper we exploit inconsistencies between measured 3D trajectories and their predictions using a physical model. In a set of proof-of-concept experiments we show how to retrieve the camera rotation and translation and how to detect surfaces that are hard to visually discern by simply tracking a rigid object. Furthermore we introduce the class distinction between active and passive objects. Prototype examples demonstrate the usability of the visual input for this type of classification. In all the presented experiments, additional information and a deeper understanding about the scene can be obtained, which would not be possible by analyzing solely the image measurements.