Brais Martinez
University of Nottingham
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Brais Martinez.
computer vision and pattern recognition | 2010
Michel F. Valstar; Brais Martinez; Xavier Binefa; Maja Pantic
Finding fiducial facial points in any frame of a video showing rich naturalistic facial behaviour is an unsolved problem. Yet this is a crucial step for geometric-feature-based facial expression analysis, and methods that use appearance-based features extracted at fiducial facial point locations. In this paper we present a method based on a combination of Support Vector Regression and Markov Random Fields to drastically reduce the time needed to search for a points location and increase the accuracy and robustness of the algorithm. Using Markov Random Fields allows us to constrain the search space by exploiting the constellations that facial points can form. The regressors on the other hand learn a mapping between the appearance of the area surrounding a point and the positions of these points, which makes detection of the points very fast and can make the algorithm robust to variations of appearance due to facial expression and moderate changes in head pose. The proposed point detection algorithm was tested on 1855 images, the results of which showed we outperform current state of the art point detectors.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2013
Brais Martinez; Michel F. Valstar; Xavier Binefa; Maja Pantic
We propose a new algorithm to detect facial points in frontal and near-frontal face images. It combines a regression-based approach with a probabilistic graphical model-based face shape model that restricts the search to anthropomorphically consistent regions. While most regression-based approaches perform a sequential approximation of the target location, our algorithm detects the target location by aggregating the estimates obtained from stochastically selected local appearance information into a single robust prediction. The underlying assumption is that by aggregating the different estimates, their errors will cancel out as long as the regressor inputs are uncorrelated. Once this new perspective is adopted, the problem is reformulated as how to optimally select the test locations over which the regressors are evaluated. We propose to extend the regression-based model to provide a quality measure of each prediction, and use the shape model to restrict and correct the sampling region. Our approach combines the low computational cost typical of regression-based approaches with the robustness of exhaustive-search approaches. The proposed algorithm was tested on over 7,500 images from five databases. Results showed significant improvement over the current state of the art.
IEEE Transactions on Systems, Man, and Cybernetics | 2014
Bihan Jiang; Michel F. Valstar; Brais Martinez; Maja Pantic
Both the configuration and the dynamics of facial expressions are crucial for the interpretation of human facial behavior. Yet to date, the vast majority of reported efforts in the field either do not take the dynamics of facial expressions into account, or focus only on prototypic facial expressions of six basic emotions. Facial dynamics can be explicitly analyzed by detecting the constituent temporal segments in Facial Action Coding System (FACS) Action Units (AUs)-onset, apex, and offset. In this paper, we present a novel approach to explicit analysis of temporal dynamics of facial actions using the dynamic appearance descriptor Local Phase Quantization from Three Orthogonal Planes (LPQ-TOP). Temporal segments are detected by combining a discriminative classifier for detecting the temporal segments on a frame-by-frame basis with Markov Models that enforce temporal consistency over the whole episode. The system is evaluated in detail over the MMI facial expression database, the UNBC-McMaster pain database, the SAL database, the GEMEP-FERA dataset in database-dependent experiments, in cross-database experiments using the Cohn-Kanade, and the SEMAINE databases. The comparison with other state-of-the-art methods shows that the proposed LPQ-TOP method outperforms the other approaches for the problem of AU temporal segment detection, and that overall AU activation detection benefits from dynamic appearance information.
Image and Vision Computing | 2013
Stavros Petridis; Brais Martinez; Maja Pantic
Laughter is clearly an audiovisual event, consisting of the laughter vocalization and of facial activity, mainly around the mouth and sometimes in the upper face. A major obstacle in studying the audiovisual aspects of laughter is the lack of suitable data. For this reason, the majority of past research on laughter classification/detection has focused on audio-only approaches. A few audiovisual studies exist which use audiovisual data from existing corpora of recorded meetings. The main problem with such data is that they usually contain large head movements which make audiovisual analysis very difficult. In this work, we present a new publicly available audiovisual database, the MAHNOB Laughter database, suitable for studying laughter. It contains 22 subjects who were recorded while watching stimulus material, using two microphones, a video camera and a thermal camera. The primary goal was to elicit laughter, but in addition, posed smiles, posed laughter, and speech were recorded as well. In total, 180 sessions are available with a total duration of 3h and 49min. There are 563 laughter episodes, 849 speech utterances, 51 posed laughs, 67 speech-laughs episodes and 167 other vocalizations annotated in the database. We also report baseline experiments for audio, visual and audiovisual approaches for laughter-vs-speech discrimination as well as further experiments on discrimination between voiced laughter, unvoiced laughter and speech. These results suggest that the combination of audio and visual information is beneficial in the presence of acoustic noise and helps discriminating between voiced laughter episodes and speech utterances. Finally, we report preliminary experiments on laughter-vs-speech discrimination based on thermal images.
computer vision and pattern recognition | 2016
Sergio Escalera; Mercedes Torres Torres; Brais Martinez; Xavier Baró; Hugo Jair Escalante; Isabelle Guyon; Georgios Tzimiropoulos; Ciprian A. Corneanu; Marc Oliu; Mohammad Ali Bagheri; Michel F. Valstar
We present the 2016 ChaLearn Looking at People and Faces of the World Challenge and Workshop, which ran three competitions on the common theme of face analysis from still images. The first one, Looking at People, addressed age estimation, while the second and third competitions, Faces of the World, addressed accessory classification and smile and gender classification, respectively. We present two crowd-sourcing methodologies used to collect manual annotations. A custom-build application was used to collect and label data about the apparent age of people (as opposed to the real age). For the Faces of the World data, the citizen-science Zooniverse platform was used. This paper summarizes the three challenges and the data used, as well as the results achieved by the participants of the competitions. Details of the ChaLearn LAP FotW competitions can be found at http://gesture.chalearn.org.
Image and Vision Computing | 2015
Javier Orozco; Brais Martinez; Maja Pantic
We present a multi-view face detector based on Cascade Deformable Part Models (CDPM). Over the last decade, there have been several attempts to extend the well-established Viola&Jones face detector algorithm to solve the problem of multi-view face detection. Recently a tree structure model for multi-view face detection was proposed. This method is primarily designed for facial landmark detection and consequently a face detection is provided. However, the effort to model inner facial structures by using a detailed facial landmark labelling resulted on a complex and suboptimal system for face detection. Instead, we adopt CDPMs, where the models are learned from partially labelled images using Latent Support Vector Machines (LSVM). Furthermore, LSVM is enhanced with data-mining and bootstrapping procedures to enrich models during the training. Furthermore, a post-optimization procedure is derived to improve the performance. This semi-supervised methodology allows us to build models based on weakly labelled data while incrementally learning latent positive and negative samples. Our results show that the proposed model can deal with highly expressive and partially occluded faces while outperforming the state-of-the-art face detectors by a large margin on challenging benchmarks such as the Face Detection Data Set and Benchmark (FDDB) 1 and the Annotated Facial Landmarks in the Wild (AFLW) 2 databases. In addition, we validate the accuracy of our models under large head pose variation and facial occlusions in the Head Pose Image Database (HPID) 3 and Caltech Occluded Faces in the Wild (COFW) datasets 4, respectively. We also outline the suitability of our models to support facial landmark detection algorithms. Display Omitted We present a state-of-the-art multi-view face detector based on Cascade Deformable Part Models (CDPM).We propose to combine data-mining and bootstrapping to learn CDPM models from weakly labelled data.We report extensive validation of our models in the FDDB, AFLW, HDDB and COFW databases.We show the suitability of our models for face alignment initialization and face detection under partial occlusions.
computer vision and pattern recognition | 2010
Brais Martinez; Xavier Binefa; Maja Pantic
This paper studies the problem of detecting facial components in thermal imagery (specifically eyes, nostrils and mouth). One of the immediate goals is to enable the automatic registration of facial thermal images. The detection of eyes and nostrils is performed using Haar features and the GentleBoost algorithm, which are shown to provide superior detection rates. The detection of the mouth is based on the detections of the eyes and the nostrils and is performed using measures of entropy and self similarity. The results show that reliable facial component detection is feasible using this methodology, getting a correct detection rate for both eyes and nostrils of 0.8. A correct eyes and nostrils detection enables a correct detection of the mouth in 65% of closed-mouth test images and in 73% of open-mouth test images.
international conference on pattern recognition | 2014
Bihan Jiang; Brais Martinez; Michel F. Valstar; Maja Pantic
In this paper we propose a new method for the detection of action units that relies on a novel region-based face representation and a mid-level decision layer that combines region-specific information. Different from other approaches, we do not represent the face as a regular grid based on the face location alone (holistic representation), nor by using small patches centred at iducial facial point locations (local representation). Instead, we propose to use domain knowledge regarding AU-specific facial muscle contractions to define a set of face regions covering the whole face. Therefore, as opposed to local appearance models, our face representation makes use of the full facial appearance, while the use of facial point locations to define the regions means that we obtain better-registered descriptors compared to holistic representations. Finally, we propose an AU-specific weighted sum model is used as a decision-level fusion layer in charge of combining region-specific probabilistic information. This configuration allows each classier to learning the typical appearance changes for a specific face part and reduces the dimensionality of the problem thus proving to be more robust. Our approach is evaluated on the DISFA and GEMEP-FERA datasets using two histogram-based appearance features, Local Binary Pattern and Local Phase Quantisation. We show superior performance for both the domain-specific region definition and the decision-level fusion respect to the standard approaches when it comes to automatic facial action unit detection.
Archive | 2016
Brais Martinez; Michel F. Valstar
In this chapter we consider the problem of automatic facial expression analysis. Our take on this is that the field has reached a point where it needs to move away from considering experiments and applications under in-the-lab conditions, and move towards so-called in-the-wild scenarios. We assume throughout this chapter that the aim is to develop technology that can be deployed in practical applications under unconstrained conditions. While some first efforts in this direction have been reported very recently, it is still unclear what the right path to achieving accurate, informative, robust, and real-time facial expression analysis will be. To illuminate the journey ahead, we first provide in Sect. 1 an overview of the existing theories and specific problem formulations considered within the computer vision community. Then we describe in Sect. 2 the standard algorithmic pipeline which is common to most facial expression analysis algorithms. We include suggestions as to which of the current algorithms and approaches are most suited to the scenario considered. In Sect. 3 we describe our view of the remaining challenges, and the current opportunities within the field. This chapter is thus not intended as a review of different approaches, but rather a selection of what we believe are the most suitable state-of-the-art algorithms, and a selection of exemplars chosen to characterise a specific approach. We review in Sect. 4 some of the exciting opportunities for the application of automatic facial expression analysis to everyday practical problems and current commercial applications being exploited. Section 5 ends the chapter by summarising the major conclusions drawn.
european conference on computer vision | 2016
Enrique Sánchez-Lozano; Brais Martinez; Georgios Tzimiropoulos; Michel F. Valstar
This paper introduces a novel real-time algorithm for facial landmark tracking. Compared to detection, tracking has both additional challenges and opportunities. Arguably the most important aspect in this domain is updating a tracker’s models as tracking progresses, also known as incremental (face) tracking. While this should result in more accurate localisation, how to do this online and in real time without causing a tracker to drift is still an important open research question. We address this question in the cascaded regression framework, the state-of-the-art approach for facial landmark localisation. Because incremental learning for cascaded regression is costly, we propose a much more efficient yet equally accurate alternative using continuous regression. More specifically, we first propose cascaded continuous regression (CCR) and show its accuracy is equivalent to the Supervised Descent Method. We then derive the incremental learning updates for CCR (iCCR) and show that it is an order of magnitude faster than standard incremental learning for cascaded regression, bringing the time required for the update from seconds down to a fraction of a second, thus enabling real-time tracking. Finally, we evaluate iCCR and show the importance of incremental learning in achieving state-of-the-art performance. Code for our iCCR is available from http://www.cs.nott.ac.uk/~psxes1.