Josep M. Gonfaus
Autonomous University of Barcelona
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Josep M. Gonfaus.
computer vision and pattern recognition | 2010
Josep M. Gonfaus; Xavier Boix; Joost van de Weijer; Andrew D. Bagdanov; Joan Serrat; Jordi Gonzàlez
Hierarchical conditional random fields have been successfully applied to object segmentation. One reason is their ability to incorporate contextual information at different scales. However, these models do not allow multiple labels to be assigned to a single node. At higher scales in the image, this yields an oversimplified model, since multiple classes can be reasonable expected to appear within one region. This simplified model especially limits the impact that observations at larger scales may have on the CRF model. Neglecting the information at larger scales is undesirable since class-label estimates based on these scales are more reliable than at smaller, noisier scales. To address this problem, we propose a new potential, called harmony potential, which can encode any possible combination of class labels. We propose an effective sampling strategy that renders tractable the underlying optimization problem. Results show that our approach obtains state-of-the-art results on two challenging datasets: Pascal VOC 2009 and MSRC-21.
International Journal of Computer Vision | 2012
Xavier Boix; Josep M. Gonfaus; Joost van de Weijer; Andrew D. Bagdanov; Joan Serrat; Jordi Gonzàlez
The Hierarchical Conditional Random Field (HCRF) model have been successfully applied to a number of image labeling problems, including image segmentation. However, existing HCRF models of image segmentation do not allow multiple classes to be assigned to a single region, which limits their ability to incorporate contextual information across multiple scales. At higher scales in the image, this representation yields an oversimplified model since multiple classes can be reasonably expected to appear within large regions. This simplified model particularly limits the impact of information at higher scales. Since class-label information at these scales is usually more reliable than at lower, noisier scales, neglecting this information is undesirable. To address these issues, we propose a new consistency potential for image labeling problems, which we call the harmony potential. It can encode any possible combination of labels, penalizing only unlikely combinations of classes. We also propose an effective sampling strategy over this expanded label set that renders tractable the underlying optimization problem. Our approach obtains state-of-the-art results on two challenging, standard benchmark datasets for semantic image segmentation: PASCAL VOC 2010, and MSRC-21.
IEEE Transactions on Systems, Man, and Cybernetics | 2017
Pau Rodríguez; Guillem Cucurull; Jordi Gonzàlez; Josep M. Gonfaus; Kamal Nasrollahi; Thomas B. Moeslund; F. Xavier Roca
Pain is an unpleasant feeling that has been shown to be an important factor for the recovery of patients. Since this is costly in human resources and difficult to do objectively, there is the need for automatic systems to measure it. In this paper, contrary to current state-of-the-art techniques in pain assessment, which are based on facial features only, we suggest that the performance can be enhanced by feeding the raw frames to deep learning models, outperforming the latest state-of-the-art results while also directly facing the problem of imbalanced data. As a baseline, our approach first uses convolutional neural networks (CNNs) to learn facial features from VGG_Faces, which are then linked to a long short-term memory to exploit the temporal relation between video frames. We further compare the performances of using the so popular schema based on the canonically normalized appearance versus taking into account the whole image. As a result, we outperform current state-of-the-art area under the curve performance in the UNBC-McMaster Shoulder Pain Expression Archive Database. In addition, to evaluate the generalization properties of our proposed methodology on facial motion recognition, we also report competitive results in the Cohn Kanade+ facial expression database.
Pattern Recognition | 2017
Pau Rodrguez; Guillem Cucurull; Josep M. Gonfaus; F. Xavier Roca; Jordi Gonzlez
A novel feedforward attention mechanism for CNNs is proposed.The mechanism increases CNNs robustness to image deformations and clutter.The proposed mechanism increases CNNs performance for age and gender recognition. Face analysis in images in the wild still pose a challenge for automatic age and gender recognition tasks, mainly due to their high variability in resolution, deformation, and occlusion. Although the performance has highly increased thanks to Convolutional Neural Networks (CNNs), it is still far from optimal when compared to other image recognition tasks, mainly because of the high sensitiveness of CNNs to facial variations. In this paper, inspired by biology and the recent success of attention mechanisms on visual question answering and fine-grained recognition, we propose a novel feedforward attention mechanism that is able to discover the most informative and reliable parts of a given face for improving age and gender classification. In particular, given a downsampled facial image, the proposed model is trained based on a novel end-to-end learning framework to extract the most discriminative patches from the original high-resolution image. Experimental validation on the standard Adience, Images of Groups, and MORPH II benchmarks show that including attention mechanisms enhances the performance of CNNs in terms of robustness and accuracy.
Computer Vision and Image Understanding | 2015
Josep M. Gonfaus; Marco Pedersoli; Jordi Gonzàlez; Andrea Vedaldi; F. Xavier Roca
We present an object detection framework based on an advanced part learning.A generic learning method of the relationships between part appearances is proposed.We treat part locations as well as their appearance as latent variablesWe modify the weakly-supervised learning to handle a more complex structure.Our model yields to state-of-the-art results for several object categories. Deformable object models capture variations in an objects appearance that can be represented as image deformations. Other effects such as out-of-plane rotations, three-dimensional articulations, and self-occlusions are often captured by considering mixture of deformable models, one per object aspect. A more scalable approach is representing instead the variations at the level of the object parts, applying the concept of a mixture locally. Combining a few part variations can in fact cheaply generate a large number of global appearances.A limited version of this idea was proposed by Yang and Ramanan 1], for human pose dectection. In this paper we apply it to the task of generic object category detection and extend it in several ways. First, we propose a model for the relationship between part appearances more general than the tree of Yang and Ramanan 1], which is more suitable for generic categories. Second, we treat part locations as well as their appearance as latent variables so that training does not need part annotations but only the object bounding boxes. Third, we modify the weakly-supervised learning of Felzenszwalb et?al. and Girshick et?al. 2], 3] to handle a significantly more complex latent structure.Our model is evaluated on standard object detection benchmarks and is found to improve over existing approaches, yielding state-of-the-art results for several object categories.
Multimodal Interaction in Image and Video Applications | 2013
Marc Castelló; Jordi Gonzàlez; Ariel Amato; Pau Baiget; Carles Fernández; Josep M. Gonfaus; Ramón Alberto Mollineda; Marco Pedersoli; Nicolás Pérez de la Blanca; F. Xavier Roca
In this paper we present an example of a video surveillance application that exploits Multimodal Interactive (MI) technologies. The main objective of the so-called VID-Hum prototype was to develop a cognitive artificial system for both the detection and description of a particular set of human behaviours arising from real-world events. The main procedure of the prototype described in this chapter entails: (i) adaptation, since the system adapts itself to the most common behaviours (qualitative data) inferred from tracking (quantitative data) thus being able to recognize abnormal behaviors; (ii) feedback, since an advanced interface based on Natural Language understanding allows end-users the communicationwith the prototype by means of conceptual sentences; and (iii) multimodality, since a virtual avatar has been designed to describe what is happening in the scene, based on those textual interpretations generated by the prototype. Thus, the MI methodology has provided an adequate framework for all these cooperating processes.
international conference on learning representations | 2017
Pau Rodríguez; Jordi Gonzàlez; Guillem Cucurull; Josep M. Gonfaus; F. Xavier Roca
arXiv: Computer Vision and Pattern Recognition | 2018
Pau Rodríguez; Josep M. Gonfaus; Guillem Cucurull; F. Xavier Roca; Jordi Gonzàlez
arXiv: Computers and Society | 2018
Guillem Cucurull; Pau Rodríguez; V. Oguz Yazici; Josep M. Gonfaus; F. Xavier Roca; Jordi Gonzàlez
Archive | 2018
Pau Rodríguez; Guillem Cucurull; Jordi Pérez González; Josep M. Gonfaus