Xavier Binefa
Pompeu Fabra University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Xavier Binefa.
computer vision and pattern recognition | 2010
Michel F. Valstar; Brais Martinez; Xavier Binefa; Maja Pantic
Finding fiducial facial points in any frame of a video showing rich naturalistic facial behaviour is an unsolved problem. Yet this is a crucial step for geometric-feature-based facial expression analysis, and methods that use appearance-based features extracted at fiducial facial point locations. In this paper we present a method based on a combination of Support Vector Regression and Markov Random Fields to drastically reduce the time needed to search for a points location and increase the accuracy and robustness of the algorithm. Using Markov Random Fields allows us to constrain the search space by exploiting the constellations that facial points can form. The regressors on the other hand learn a mapping between the appearance of the area surrounding a point and the positions of these points, which makes detection of the points very fast and can make the algorithm robust to variations of appearance due to facial expression and moderate changes in head pose. The proposed point detection algorithm was tested on 1855 images, the results of which showed we outperform current state of the art point detectors.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2013
Brais Martinez; Michel F. Valstar; Xavier Binefa; Maja Pantic
We propose a new algorithm to detect facial points in frontal and near-frontal face images. It combines a regression-based approach with a probabilistic graphical model-based face shape model that restricts the search to anthropomorphically consistent regions. While most regression-based approaches perform a sequential approximation of the target location, our algorithm detects the target location by aggregating the estimates obtained from stochastically selected local appearance information into a single robust prediction. The underlying assumption is that by aggregating the different estimates, their errors will cancel out as long as the regressor inputs are uncorrelated. Once this new perspective is adopted, the problem is reformulated as how to optimally select the test locations over which the regressors are evaluated. We propose to extend the regression-based model to provide a quality measure of each prediction, and use the shape model to restrict and correct the sampling region. Our approach combines the low computational cost typical of regression-based approaches with the robustness of exhaustive-search approaches. The proposed algorithm was tested on over 7,500 images from five databases. Results showed significant improvement over the current state of the art.
Lecture Notes in Computer Science | 1999
Juan María Sánchez; Xavier Binefa; Jordi Vitrià; Petia Radeva
TV commercials recognition is a need for advertisers in order to check the fulfillment of their contracts with TV stations. In this paper we present an approach to this problem based on compacting a representative frame of each shot by a PCA of its color histogram. We also present a new algorithm for scene break detection based on the analysis of local color variations in consecutive frames of some specific regions of the image.
computer vision and pattern recognition | 2014
Luis Ferraz; Xavier Binefa; Francesc Moreno-Noguer
We propose a real-time, robust to outliers and accurate solution to the Perspective-n-Point (PnP) problem. The main advantages of our solution are twofold: first, it in- tegrates the outlier rejection within the pose estimation pipeline with a negligible computational overhead, and sec- ond, its scalability to arbitrarily large number of correspon- dences. Given a set of 3D-to-2D matches, we formulate pose estimation problem as a low-rank homogeneous sys- tem where the solution lies on its 1D null space. Outlier correspondences are those rows of the linear system which perturb the null space and are progressively detected by projecting them on an iteratively estimated solution of the null space. Since our outlier removal process is based on an algebraic criterion which does not require computing the full-pose and reprojecting back all 3D points on the image plane at each step, we achieve speed gains of more than 100× compared to RANSAC strategies. An extensive exper- imental evaluation will show that our solution yields accu- rate results in situations with up to 50% of outliers, and can process more than 1000 correspondences in less than 5ms.
Multimedia Tools and Applications | 2002
Juan María Sánchez; Xavier Binefa; Jordi Vitrià
Digital video applications exploit the intrinsic structure of video sequences. In order to obtain and represent this structure for video annotation and indexing tasks, the main initial step is automatic shot partitioning. This paper analyzes the problem of automatic TV commercials recognition, and a new algorithm for scene break detection is then introduced. The structure of each commercial is represented by the set of its key-frames, which are automatically extracted from the video stream. The particular characteristics of commercials make commonly used shot boundary detection techniques obtain worse results than with other video content domains. These techniques are based on individual image features or visual cues, which show significant performance lacks when they are applied to complex video content domains like commercials. We present a new scene break detection algorithm based on the combined analysis of edge and color features. Local motion estimation is applied to each edge in a frame, and the continuity of the color around them is then checked in the following frame. By separately considering both sides of each edge, we rely on the continuous presence of the objects and/or the background of the scene during each shot. Experimental results show that this approach outperforms single feature algorithms in terms of precision and recall.
computer vision and pattern recognition | 2000
Ricardo Toledo; Xavier Orriols; Xavier Binefa; Petia Radeva; Jordi Vitrià; Juan José Villanueva
In this paper we introduce a statistic snake that learns and tracks image features by means of statistic learning techniques. Using probabilistic principal component analysis a feature description is obtained from a training set of object profiles. In our approach a sound statistical model is introduced to define a likelihood estimate of the grey-level local image profiles together with their local orientation. This likelihood estimate allows to define a probabilistic potential field of the snake where the elastic curve deforms to maximise the overall probability of detecting learned image features. To improve the convergence of snake deformation, we enhance the likelihood map by a physics-based model simulating a dipole-dipole interaction. A new extended local coherent interaction is introduced defined in terms of extended structure tensor of the image to give priority to parallel coherence vectors.
international conference on pattern recognition | 2000
Ricardo Toledo; Xavier Orriols; Petia Radeva; Xavier Binefa; Jordi Vitrià; C. Canero; J.J. Villanuev
We introduce a new deformable model, called eigensnake, for segmentation of elongated structures in a probabilistic framework. Instead of snake attraction by specific image features extracted independently of the snake, our eigensnake learns an optimal object description and searches for such image feature in the target image. This is achieved applying principal component analysis on image responses of a bank of Gaussian derivative filters. Therefore, attraction by eigensnakes is defined in terms of classification of image features. The potential energy for the snake is defined in terms of likelihood in the feature space and incorporated into a new energy minimising scheme. Hence, the snake deforms to minimise the mahalanobis distance in the feature space. A real application of segmenting and tracking coronary vessels in angiography is considered and the results are very encouraging.
computer vision and pattern recognition | 2010
Brais Martinez; Xavier Binefa; Maja Pantic
This paper studies the problem of detecting facial components in thermal imagery (specifically eyes, nostrils and mouth). One of the immediate goals is to enable the automatic registration of facial thermal images. The detection of eyes and nostrils is performed using Haar features and the GentleBoost algorithm, which are shown to provide superior detection rates. The detection of the mouth is based on the detections of the eyes and the nostrils and is performed using measures of entropy and self similarity. The results show that reliable facial component detection is feasible using this methodology, getting a correct detection rate for both eyes and nostrils of 0.8. A correct eyes and nostrils detection enables a correct detection of the mouth in 65% of closed-mouth test images and in 73% of open-mouth test images.
international conference on computer vision | 2001
Xavier Orriols; Xavier Binefa
In this paper, we address the visual video summarization problem in a Bayesian framework in order to detect and describe the underlying temporal transformation symmetries in a video sequence. Given a set of time correlated frames, we attempt to extract a reduced number of image-like data structures which are semantically meaningful and that have the ability of representing the sequence evolution. To this end, we present a generative model which involves jointly the representation and the evolution of appearance. Applying Linear Dynamical System theory to this problem, we discuss how the temporal information is encoded yielding a manner of grouping the iconic representations of the video sequence in terms of invariance. The formulation of this problem is driven in terms of a probabilistic approach, which affords a measure of perceptual similarity taking both learned appearance and time evolution models into account.
international conference on pattern recognition | 2000
Josep Garcia; Juan María Sánchez; Xavier Orriols; Xavier Binefa
We introduce chromatic aberration as a source of visual information that can be useful for autofocus and depth estimation. A color video camera equipped with a lens with chromatic aberration has been used to take images of both step and occlusion edges at several distances from the camera. The defocus measures obtained in the three different RGB color channels of each image are different. We suggest the way this information can be exploited in order to design an autofocus sensor, and also how depth information can be derived.