Is this you? Create Your Porfile

Josep M. Gonfaus

Autonomous University of Barcelona

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Josep M. Gonfaus is active.

Explore More

Publication

Featured researches published by Josep M. Gonfaus.

computer vision and pattern recognition | 2010

Harmony potentials for joint classification and segmentation

Josep M. Gonfaus; Xavier Boix; Joost van de Weijer; Andrew D. Bagdanov; Joan Serrat; Jordi Gonzàlez

Hierarchical conditional random fields have been successfully applied to object segmentation. One reason is their ability to incorporate contextual information at different scales. However, these models do not allow multiple labels to be assigned to a single node. At higher scales in the image, this yields an oversimplified model, since multiple classes can be reasonable expected to appear within one region. This simplified model especially limits the impact that observations at larger scales may have on the CRF model. Neglecting the information at larger scales is undesirable since class-label estimates based on these scales are more reliable than at smaller, noisier scales. To address this problem, we propose a new potential, called harmony potential, which can encode any possible combination of class labels. We propose an effective sampling strategy that renders tractable the underlying optimization problem. Results show that our approach obtains state-of-the-art results on two challenging datasets: Pascal VOC 2009 and MSRC-21.

International Journal of Computer Vision | 2012

Harmony Potentials

Xavier Boix; Josep M. Gonfaus; Joost van de Weijer; Andrew D. Bagdanov; Joan Serrat; Jordi Gonzàlez

The Hierarchical Conditional Random Field (HCRF) model have been successfully applied to a number of image labeling problems, including image segmentation. However, existing HCRF models of image segmentation do not allow multiple classes to be assigned to a single region, which limits their ability to incorporate contextual information across multiple scales. At higher scales in the image, this representation yields an oversimplified model since multiple classes can be reasonably expected to appear within large regions. This simplified model particularly limits the impact of information at higher scales. Since class-label information at these scales is usually more reliable than at lower, noisier scales, neglecting this information is undesirable. To address these issues, we propose a new consistency potential for image labeling problems, which we call the harmony potential. It can encode any possible combination of labels, penalizing only unlikely combinations of classes. We also propose an effective sampling strategy over this expanded label set that renders tractable the underlying optimization problem. Our approach obtains state-of-the-art results on two challenging, standard benchmark datasets for semantic image segmentation: PASCAL VOC 2010, and MSRC-21.

IEEE Transactions on Systems, Man, and Cybernetics | 2017

Deep Pain: Exploiting Long Short-Term Memory Networks for Facial Expression Classification

Pau Rodríguez; Guillem Cucurull; Jordi Gonzàlez; Josep M. Gonfaus; Kamal Nasrollahi; Thomas B. Moeslund; F. Xavier Roca

Pain is an unpleasant feeling that has been shown to be an important factor for the recovery of patients. Since this is costly in human resources and difficult to do objectively, there is the need for automatic systems to measure it. In this paper, contrary to current state-of-the-art techniques in pain assessment, which are based on facial features only, we suggest that the performance can be enhanced by feeding the raw frames to deep learning models, outperforming the latest state-of-the-art results while also directly facing the problem of imbalanced data. As a baseline, our approach first uses convolutional neural networks (CNNs) to learn facial features from VGG_Faces, which are then linked to a long short-term memory to exploit the temporal relation between video frames. We further compare the performances of using the so popular schema based on the canonically normalized appearance versus taking into account the whole image. As a result, we outperform current state-of-the-art area under the curve performance in the UNBC-McMaster Shoulder Pain Expression Archive Database. In addition, to evaluate the generalization properties of our proposed methodology on facial motion recognition, we also report competitive results in the Cohn Kanade+ facial expression database.

Pattern Recognition | 2017

Age and gender recognition in the wild with deep attention

Pau Rodrguez; Guillem Cucurull; Josep M. Gonfaus; F. Xavier Roca; Jordi Gonzlez

A novel feedforward attention mechanism for CNNs is proposed.The mechanism increases CNNs robustness to image deformations and clutter.The proposed mechanism increases CNNs performance for age and gender recognition. Face analysis in images in the wild still pose a challenge for automatic age and gender recognition tasks, mainly due to their high variability in resolution, deformation, and occlusion. Although the performance has highly increased thanks to Convolutional Neural Networks (CNNs), it is still far from optimal when compared to other image recognition tasks, mainly because of the high sensitiveness of CNNs to facial variations. In this paper, inspired by biology and the recent success of attention mechanisms on visual question answering and fine-grained recognition, we propose a novel feedforward attention mechanism that is able to discover the most informative and reliable parts of a given face for improving age and gender classification. In particular, given a downsampled facial image, the proposed model is trained based on a novel end-to-end learning framework to extract the most discriminative patches from the original high-resolution image. Experimental validation on the standard Adience, Images of Groups, and MORPH II benchmarks show that including attention mechanisms enhances the performance of CNNs in terms of robustness and accuracy.

Computer Vision and Image Understanding | 2015

Factorized appearances for object detection

Josep M. Gonfaus; Marco Pedersoli; Jordi Gonzàlez; Andrea Vedaldi; F. Xavier Roca

We present an object detection framework based on an advanced part learning.A generic learning method of the relationships between part appearances is proposed.We treat part locations as well as their appearance as latent variablesWe modify the weakly-supervised learning to handle a more complex structure.Our model yields to state-of-the-art results for several object categories. Deformable object models capture variations in an objects appearance that can be represented as image deformations. Other effects such as out-of-plane rotations, three-dimensional articulations, and self-occlusions are often captured by considering mixture of deformable models, one per object aspect. A more scalable approach is representing instead the variations at the level of the object parts, applying the concept of a mixture locally. Combining a few part variations can in fact cheaply generate a large number of global appearances.A limited version of this idea was proposed by Yang and Ramanan 1], for human pose dectection. In this paper we apply it to the task of generic object category detection and extend it in several ways. First, we propose a model for the relationship between part appearances more general than the tree of Yang and Ramanan 1], which is more suitable for generic categories. Second, we treat part locations as well as their appearance as latent variables so that training does not need part annotations but only the object bounding boxes. Third, we modify the weakly-supervised learning of Felzenszwalb et?al. and Girshick et?al. 2], 3] to handle a significantly more complex latent structure.Our model is evaluated on standard object detection benchmarks and is found to improve over existing approaches, yielding state-of-the-art results for several object categories.

Multimodal Interaction in Image and Video Applications | 2013

Exploiting multimodal interaction techniques for video-surveillance

Marc Castelló; Jordi Gonzàlez; Ariel Amato; Pau Baiget; Carles Fernández; Josep M. Gonfaus; Ramón Alberto Mollineda; Marco Pedersoli; Nicolás Pérez de la Blanca; F. Xavier Roca

In this paper we present an example of a video surveillance application that exploits Multimodal Interactive (MI) technologies. The main objective of the so-called VID-Hum prototype was to develop a cognitive artificial system for both the detection and description of a particular set of human behaviours arising from real-world events. The main procedure of the prototype described in this chapter entails: (i) adaptation, since the system adapts itself to the most common behaviours (qualitative data) inferred from tracking (quantitative data) thus being able to recognize abnormal behaviors; (ii) feedback, since an advanced interface based on Natural Language understanding allows end-users the communicationwith the prototype by means of conceptual sentences; and (iii) multimodality, since a virtual avatar has been designed to describe what is happening in the scene, based on those textual interpretations generated by the prototype. Thus, the MI methodology has provided an adequate framework for all these cooperating processes.

international conference on learning representations | 2017