Xavier Baró | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xavier Baró is active.

Explore More

Publication

Featured researches published by Xavier Baró.

IEEE Transactions on Intelligent Transportation Systems | 2009

Traffic Sign Recognition Using Evolutionary Adaboost Detection and Forest-ECOC Classification

Xavier Baró; Sergio Escalera; Jordi Vitrià; Oriol Pujol; Petia Radeva

The high variability of sign appearance in uncontrolled environments has made the detection and classification of road signs a challenging problem in computer vision. In this paper, we introduce a novel approach for the detection and classification of traffic signs. Detection is based on a boosted detectors cascade, trained with a novel evolutionary version of Adaboost, which allows the use of large feature spaces. Classification is defined as a multiclass categorization problem. A battery of classifiers is trained to split classes in an Error-Correcting Output Code (ECOC) framework. We propose an ECOC design through a forest of optimal tree structures that are embedded in the ECOC matrix. The novel system offers high performance and better accuracy than the state-of-the-art strategies and is potentially better in terms of noise, affine deformation, partial occlusions, and reduced illumination.

european conference on computer vision | 2014

ChaLearn Looking at People Challenge 2014: Dataset and Results

Sergio Escalera; Xavier Baró; Jordi Gonzàlez; Miguel Ángel Bautista; Meysam Madadi; Miguel Reyes; Víctor Ponce-López; Hugo Jair Escalante; Jamie Shotton; Isabelle Guyon

This paper summarizes the ChaLearn Looking at People 2014 challenge data and the results obtained by the participants. The competition was split into three independent tracks: human pose recovery from RGB data, action and interaction recognition from RGB data sequences, and multi-modal gesture recognition from RGB-Depth sequences. For all the tracks, the goal was to perform user-independent recognition in sequences of continuous images using the overlapping Jaccard index as the evaluation measure. In this edition of the ChaLearn challenge, two large novel data sets were made publicly available and the Microsoft Codalab platform were used to manage the competition. Outstanding results were achieved in the three challenge tracks, with accuracy results of 0.20, 0.50, and 0.85 for pose recovery, action/interaction recognition, and multi-modal gesture recognition, respectively.

international conference on computer vision | 2015

ChaLearn Looking at People 2015: Apparent Age and Cultural Event Recognition Datasets and Results

Sergio Escalera; Junior Fabian; Pablo Pardo; Xavier Baró; Jordi Gonzàlez; Hugo Jair Escalante; Dusan Misevic; Ulrich K. Steiner; Isabelle Guyon

Following previous series on Looking at People (LAP) competitions [14, 13, 11, 12, 2], in 2015 ChaLearn ran two new competitions within the field of Looking at People: (1) age estimation, and (2) cultural event recognition, both in still images. We developed a crowd-sourcing application to collect and label data about the apparent age of people (as opposed to the real age). In terms of cultural event recognition, one hundred categories had to be recognized. These tasks involved scene understanding and human body analysis. This paper summarizes both challenges and data, as well as the results achieved by the participants of the competition. Details of the ChaLearn LAP competitions can be found at http://gesture.chalearn.org/.

Pattern Recognition Letters | 2014

Probability-based Dynamic Time Warping and Bag-of-Visual-and-Depth-Words for Human Gesture Recognition in RGB-D

Antonio Hernández-Vela; Miguel Ángel Bautista; Xavier Perez-Sala; Víctor Ponce-López; Sergio Escalera; Xavier Baró; Oriol Pujol; Cecilio Angulo

We present a probability-based DTW for gesture segmentation.We present the BoVDW framework for gesture classification.New VFHCRH descriptor for depth images. We present a methodology to address the problem of human gesture segmentation and recognition in video and depth image sequences. A Bag-of-Visual-and-Depth-Words (BoVDW) model is introduced as an extension of the Bag-of-Visual-Words (BoVW) model. State-of-the-art RGB and depth features, including a newly proposed depth descriptor, are analysed and combined in a late fusion form. The method is integrated in a Human Gesture Recognition pipeline, together with a novel probability-based Dynamic Time Warping (PDTW) algorithm which is used to perform prior segmentation of idle gestures. The proposed DTW variant uses samples of the same gesture category to build a Gaussian Mixture Model driven probabilistic model of that gesture class. Results of the whole Human Gesture Recognition pipeline in a public data set show better performance in comparison to both standard BoVW model and DTW approach.

international conference on multimodal interfaces | 2013

ChaLearn multi-modal gesture recognition 2013: grand challenge and workshop summary

Sergio Escalera; Jordi Gonzàlez; Xavier Baró; Miguel Reyes; Isabelle Guyon; Vassilis Athitsos; Hugo Jair Escalante; Leonid Sigal; Antonis A. Argyros; Cristian Sminchisescu; Richard Bowden; Stan Sclaroff

We organized a Grand Challenge and Workshop on Multi-Modal Gesture Recognition. The MMGR Grand Challenge focused on the recognition of continuous natural gestures from multi-modal data (including RGB, Depth, user mask, Skeletal model, and audio). We made available a large labeled video database of 13,858 gestures from a lexicon of 20 Italian gesture categories recorded with a KinectTM camera. More than 54 teams participated in the challenge and a final error rate of 12% was achieved by the winner of the competition. Winners of the competition published their work in the workshop of the Challenge. The MMGR Workshop was held at ICMI conference 2013, Sidney. A total of 9 relevant papers with basis on multi-modal gesture recognition were accepted for presentation. This includes multi-modal descriptors, multi-class learning strategies for segmentation and classification in temporal data, as well as relevant applications in the field, including multi-modal Social Signal Processing and multi-modal Human Computer Interfaces. Five relevant invited speakers participated in the workshop: Profs. Leonid Signal from Disney Research, Antonis Argyros from FORTH, Institute of Computer Science, Cristian Sminchisescu from Lund University, Richard Bowden from University of Surrey, and Stan Sclaroff from Boston University. They summarized their research in the field and discussed past, current, and future challenges in Multi-Modal Gesture Recognition.

computer vision and pattern recognition | 2016

ChaLearn Looking at People and Faces of the World: Face AnalysisWorkshop and Challenge 2016

Sergio Escalera; Mercedes Torres Torres; Brais Martinez; Xavier Baró; Hugo Jair Escalante; Isabelle Guyon; Georgios Tzimiropoulos; Ciprian A. Corneanu; Marc Oliu; Mohammad Ali Bagheri; Michel F. Valstar

We present the 2016 ChaLearn Looking at People and Faces of the World Challenge and Workshop, which ran three competitions on the common theme of face analysis from still images. The first one, Looking at People, addressed age estimation, while the second and third competitions, Faces of the World, addressed accessory classification and smile and gender classification, respectively. We present two crowd-sourcing methodologies used to collect manual annotations. A custom-build application was used to collect and label data about the apparent age of people (as opposed to the real age). For the Faces of the World data, the citizen-science Zooniverse platform was used. This paper summarizes the three challenges and the data used, as well as the results achieved by the participants of the competitions. Details of the ChaLearn LAP FotW competitions can be found at http://gesture.chalearn.org.

international conference on pattern recognition | 2016

ChaLearn Joint Contest on Multimedia Challenges Beyond Visual Analysis: An overview

Hugo Jair Escalante; Víctor Ponce-López; Jun Wan; Michael Riegler; Baiyu Chen; Albert Clapés; Sergio Escalera; Isabelle Guyon; Xavier Baró; Pål Halvorsen; Henning Müller; Martha Larson

This paper provides an overview of the Joint Contest on Multimedia Challenges Beyond Visual Analysis. We organized an academic competition that focused on four problems that require effective processing of multimodal information in order to be solved. Two tracks were devoted to gesture spotting and recognition from RGB-D video, two fundamental problems for human computer interaction. Another track was devoted to a second round of the first impressions challenge of which the goal was to develop methods to recognize personality traits from short video clips. For this second round we adopted a novel collaborative-competitive (i.e., coopetition) setting. The fourth track was dedicated to the problem of video recommendation for improving user experience. The challenge was open for about 45 days, and received outstanding participation: almost 200 participants registered to the contest, and 20 teams sent predictions in the final stage. The main goals of the challenge were fulfilled: the state of the art was advanced considerably in the four tracks, with novel solutions to the proposed problems (mostly relying on deep learning). However, further research is still required. The data of the four tracks will be available to allow researchers to keep making progress in the four tracks.

Revised Selected and Invited Papers of the International Workshop on Advances in Depth Image Analysis and Applications - Volume 7854 | 2012

Probability-Based Dynamic Time Warping for Gesture Recognition on RGB-D Data

Miguel Ángel Bautista; Antonio Hernández-Vela; Victor Ponce; Xavier Perez-Sala; Xavier Baró; Oriol Pujol; Cecilio Angulo; Sergio Escalera

Dynamic Time Warping DTW is commonly used in gesture recognition tasks in order to tackle the temporal length variability of gestures. In the DTW framework, a set of gesture patterns are compared one by one to a maybe infinite test sequence, and a query gesture category is recognized if a warping cost below a certain threshold is found within the test sequence. Nevertheless, either taking one single sample per gesture category or a set of isolated samples may not encode the variability of such gesture category. In this paper, a probability-based DTW for gesture recognition is proposed. Different samples of the same gesture pattern obtained from RGB-Depth data are used to build a Gaussian-based probabilistic model of the gesture. Finally, the cost of DTW has been adapted accordingly to the new model. The proposed approach is tested in a challenging scenario, showing better performance of the probability-based DTW in comparison to state-of-the-art approaches for gesture recognition on RGB-D data.

Pattern Recognition Letters | 2012

Minimal design of error-correcting output codes

Miguel Ángel Bautista; Sergio Escalera; Xavier Baró; Petia Radeva; Jordi Vitrià; Oriol Pujol

The classification of large number of object categories is a challenging trend in the pattern recognition field. In literature, this is often addressed using an ensemble of classifiers. In this scope, the Error-correcting output codes framework has demonstrated to be a powerful tool for combining classifiers. However, most state-of-the-art ECOC approaches use a linear or exponential number of classifiers, making the discrimination of a large number of classes unfeasible. In this paper, we explore and propose a minimal design of ECOC in terms of the number of classifiers. Evolutionary computation is used for tuning the parameters of the classifiers and looking for the best minimal ECOC code configuration. The results over several public UCI datasets and different multi-class computer vision problems show that the proposed methodology obtains comparable (even better) results than state-of-the-art ECOC methodologies with far less number of dichotomizers.

computer vision and pattern recognition | 2015

ChaLearn Looking at People 2015 challenges: Action spotting and cultural event recognition

Xavier Baró; Jordi Gonzàlez; Junior Fabian; Miguel Ángel Bautista; Marc Oliu; Hugo Jair Escalante; Isabelle Guyon; Sergio Escalera

Following previous series on Looking at People (LAP) challenges [6, 5, 4], ChaLearn ran two competitions to be presented at CVPR 2015: action/interaction spotting and cultural event recognition in RGB data. We ran a second round on human activity recognition on RGB data sequences. In terms of cultural event recognition, tens of categories have to be recognized. This involves scene understanding and human analysis. This paper summarizes the two challenges and the obtained results. Details of the ChaLearn LAP competitions can be found at http://gesture.chalearn.org/.

Explore More