Is this you? Create Your Porfile

F. Xavier Roca

Autonomous University of Barcelona

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where F. Xavier Roca is active.

Explore More

Publication

Featured researches published by F. Xavier Roca.

Pattern Recognition | 2015

A coarse-to-fine approach for fast deformable object detection

Marco Pedersoli; Andrea Vedaldi; Jordi Gonzàlez; F. Xavier Roca

We present a method that can dramatically accelerate object detection with part based models. The method is based on the observation that the cost of detection is likely dominated by the cost of matching each part to the image, and not by the cost of computing the optimal configuration of the parts as commonly assumed. To minimize the number of part-to-image comparisons we propose a multiple-resolutions hierarchical part-based model and a corresponding coarse-to-fine inference procedure that recursively eliminates from the search space unpromising part placements. The method yields a ten-fold speedup over the standard dynamic programming approach and, combined with the cascade-of-parts approach, a hundred-fold speedup in some cases. We evaluate our method extensively on the PASCAL VOC and INRIA datasets, demonstrating a very high increase in the detection speed with little degradation of the accuracy. Graphical abstractDisplay Omitted HighlightsNew method for fast deformable object detection.The cost of detection is dominated by the cost of matching the model parts.Multiresolution part-based model with a fast coarse-to-fine inference.Lateral connections among parts helps to maintain a coherent object representation.Our fast inference can be combined with cascades to multiply the speed-up.

Image and Vision Computing | 2009

Understanding dynamic scenes based on human sequence evaluation

Jordi Gonzílez; Daniel Rowe; Javier Varona; F. Xavier Roca

In this paper, a Cognitive Vision System (CVS) is presented, which explains the human behaviour of monitored scenes using natural-language texts. This cognitive analysis of human movements recorded in image sequences is here referred to as Human Sequence Evaluation (HSE) which defines a set of transformation modules involved in the automatic generation of semantic descriptions from pixel values. In essence, the trajectories of human agents are obtained to generate textual interpretations of their motion, and also to infer the conceptual relationships of each agent w.r.t. its environment. For this purpose, a human behaviour model based on Situation Graph Trees (SGTs) is considered, which permits both bottom-up (hypothesis generation) and top-down (hypothesis refinement) analysis of dynamic scenes. The resulting system prototype interprets different kinds of behaviour and reports textual descriptions in multiple languages.

IEEE Transactions on Intelligent Transportation Systems | 2014

Toward Real-Time Pedestrian Detection Based on a Deformable Template Model

Marco Pedersoli; Jordi Gonzàlez; Xu Hu; F. Xavier Roca

Most advanced driving assistance systems already include pedestrian detection systems. Unfortunately, there is still a tradeoff between precision and real time. For a reliable detection, excellent precision-recall such a tradeoff is needed to detect as many pedestrians as possible while, at the same time, avoiding too many false alarms; in addition, a very fast computation is needed for fast reactions to dangerous situations. Recently, novel approaches based on deformable templates have been proposed since these show a reasonable detection performance although they are computationally too expensive for real-time performance. In this paper, we present a system for pedestrian detection based on a hierarchical multiresolution part-based model. The proposed system is able to achieve state-of-the-art detection accuracy due to the local deformations of the parts while exhibiting a speedup of more than one order of magnitude due to a fast coarse-to-fine inference technique. Moreover, our system explicitly infers the level of resolution available so that the detection of small examples is feasible with a very reduced computational cost. We conclude this contribution by presenting how a graphics processing unit-optimized implementation of our proposed system is suitable for real-time pedestrian detection in terms of both accuracy and speed.

international conference on computer vision | 2011

A selective spatio-temporal interest point detector for human action recognition in complex scenes

Bhaskar Chakraborty; Michael Boelstoft Holte; Thomas B. Moeslund; Jordi Gonzàlez; F. Xavier Roca

Recent progress in the field of human action recognition points towards the use of Spatio-Temporal Interest Points (STIPs) for local descriptor-based recognition strategies. In this paper we present a new approach for STIP detection by applying surround suppression combined with local and temporal constraints. Our method is significantly different from existing STIP detectors and improves the performance by detecting more repeatable, stable and distinctive STIPs for human actors, while suppressing unwanted background STIPs. For action representation we use a bag-of-visual words (BoV) model of local N-jet features to build a vocabulary of visual-words. To this end, we introduce a novel vocabulary building strategy by combining spatial pyramid and vocabulary compression techniques, resulting in improved performance and efficiency. Action class specific Support Vector Machine (SVM) classifiers are trained for categorization of human actions. A comprehensive set of experiments on existing benchmark datasets, and more challenging datasets of complex scenes, validate our approach and show state-of-the-art performance.

Pattern Recognition | 2009

Action-specific motion prior for efficient Bayesian 3D human body tracking

Ignasi Rius; Jordi Gonzílez; Javier Varona; F. Xavier Roca

In this paper, we aim to reconstruct the 3D motion parameters of a human body model from the known 2D positions of a reduced set of joints in the image plane. Towards this end, an action-specific motion model is trained from a database of real motion-captured performances, and used within a particle filtering framework as a priori knowledge on human motion. First, our dynamic model guides the particles according to similar situations previously learnt. Then, the state space is constrained so only feasible human postures are accepted as valid solutions at each time step. As a result, we are able to track the 3D configuration of the full human body from several cycles of walking motion sequences using only the 2D positions of a very reduced set of joints from lateral or frontal viewpoints.

machine vision applications | 2009

Real-time gaze tracking with appearance-based models

Javier Orozco; F. Xavier Roca; Jordi Gonzàlez

Psychological evidence has emphasized the importance of eye gaze analysis in human computer interaction and emotion interpretation. To this end, current image analysis algorithms take into consideration eye-lid and iris motion detection using colour information and edge detectors. However, eye movement is fast and and hence difficult to use to obtain a precise and robust tracking. Instead, our method proposed to describe eyelid and iris movements as continuous variables using appearance-based tracking. This approach combines the strengths of adaptive appearance models, optimization methods and backtracking techniques. Thus, in the proposed method textures are learned on-line from near frontal images and illumination changes, occlusions and fast movements are managed. The method achieves real-time performance by combining two appearance-based trackers to a backtracking algorithm for eyelid estimation and another for iris estimation. These contributions represent a significant advance towards a reliable gaze motion description for HCI and expression analysis, where the strength of complementary methodologies are combined to avoid using high quality images, colour information, texture training, camera settings and other time-consuming processes.

articulated motion and deformable objects | 2002

aSpaces: Action Spaces for Recognition and Synthesis of Human Actions

Jordi Gonzàlez; Xavier Varona; F. Xavier Roca; Juan José Villanueva

Human behavior analysis is an open problem in the computer vision community. The aim of this paper is to model human actions. We present a taxonomy in order to discuss about a knowledge-based classification of human behavior. A novel human action model is presented, called the aSpace, based on a Point Distribution Model (PDM). This representation is compact, accurate and specific. The human body model is represented as a stick figure, and several sequences of humans actions are used to compute the aSpace. In order to test our action representation, two applications are provided: recognition and synthesis of actions.

international conference on image processing | 2009

Trinocular stereo matching with composite disparity space image

Mikhail Mozerov; Jordi Gonzàlez; F. Xavier Roca; Juan José Villanueva

In this paper we propose a method that smartly improves occlusion handling in stereo matching using trinocular stereo. The main idea is based on the assumption that any occluded region in a matched stereo pair (middle-left images) in general is not occluded in the opposite matched pair (middle-right images). Then two disparity space images (DSI) are merged in one composite DSI. The proposed integration differs from the known approach that uses a cumulative cost. The experimental results are evaluated on the Middlebury data set, showing high performance of the proposed algorithm especially in the occluded regions. Our method solves the problem on the base of a real matching cost, in such a way a global optimization problem is solved just once, and the resultant solution does not have to be corrected in the occluded regions. In contrast, the traditional methods that use two images approach have to complicate a lot their algorithms by additional add hog or heuristic techniques to reach competitive results in occluded regions.

Image and Vision Computing | 2012

Compact and adaptive spatial pyramids for scene recognition

Noha M. Elfiky; Jordi Gonzàlez; F. Xavier Roca

Most successful approaches on scene recognition tend to efficiently combine global image features with spatial local appearance and shape cues. On the other hand, less attention has been devoted for studying spatial texture features within scenes. Our method is based on the insight that scenes can be seen as a composition of micro-texture patterns. This paper analyzes the role of texture along with its spatial layout for scene recognition. However, one main drawback of the resulting spatial representation is its huge dimensionality. Hence, we propose a technique that addresses this problem by presenting a compact Spatial Pyramid (SP) representation. The basis of our compact representation, namely, Compact Adaptive Spatial Pyramid (CASP) consists of a two-stages compression strategy. This strategy is based on the Agglomerative Information Bottleneck (AIB) theory for (i) compressing the least informative SP features, and, (ii) automatically learning the most appropriate shape for each category. Our method exceeds the state-of-the-art results on several challenging scene recognition data sets. Highlights? A major drawback of Spatial Pyramid (SP) is the high dimensionality. We present a novel framework to obtain compact SP. ? We present compression strategies based on an extension to the agglomerative information bottleneck algorithm. ? We present a novel spatial texture descriptor (PC-TPLBP) for the problem of scene recognition. ? We show the importance of combining PC-TPLBP (regional) with pixel-based features (local) for improving performance.

international conference on pattern recognition | 2010

Reactive Object Tracking with a Single PTZ Camera

Murad Al Haj; Andrew D. Bagdanov; Jordi Gonzàlez; F. Xavier Roca

In this paper we describe a novel approach to reactive tracking of moving targets with a pan-tilt-zoom camera. The approach uses an extended Kalman filter to jointly track the object position in the real world, its velocity in 3D and the camera intrinsics, in addition to the rate of change of these parameters. The filter outputs are used as inputs to PID controllers which continuously adjust the camera motion in order to reactively track the object at a constant image velocity while simultaneously maintaining a desirable target scale in the image plane. We provide experimental results on simulated and real tracking sequences to show how our tracker is able to accurately estimate both 3D object position and camera intrinsics with very high precision over a wide range of focal lengths.

Explore More