Aleix M. Martinez | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Aleix M. Martinez is active.

Explore More

Publication

Featured researches published by Aleix M. Martinez.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2001

PCA versus LDA

Aleix M. Martinez; Avinash C. Kak

In the context of the appearance-based paradigm for object recognition, it is generally believed that algorithms based on LDA (linear discriminant analysis) are superior to those based on PCA (principal components analysis). In this communication, we show that this is not always the case. We present our case first by using intuitively plausible arguments and, then, by showing actual results on a face database. Our overall conclusion is that when the training data set is small, PCA can outperform LDA and, also, that PCA is less sensitive to different training data sets.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2006

Subclass discriminant analysis

Manli Zhu; Aleix M. Martinez

Over the years, many discriminant analysis (DA) algorithms have been proposed for the study of high-dimensional data in a large variety of problems. Each of these algorithms is tuned to a specific type of data distribution (that which best models the problem at hand). Unfortunately, in most problems the form of each class pdf is a priori unknown, and the selection of the DA algorithm that best fits our data is done over trial-and-error. Ideally, one would like to have a single formulation which can be used for most distribution types. This can be achieved by approximating the underlying distribution of each class with a mixture of Gaussians. In this approach, the major problem to be addressed is that of determining the optimal number of Gaussians per class, i.e., the number of subclasses. In this paper, two criteria able to find the most convenient division of each class into a set of subclasses are derived. Extensive experimental results are shown using five databases. Comparisons are given against linear discriminant analysis (LDA), direct LDA (DLDA), heteroscedastic LDA (HLDA), nonparametric DA (NDA), and kernel-based LDA (K-LDA). We show that our method is always the best or comparable to the best

Proceedings of the National Academy of Sciences of the United States of America | 2014

Compound facial expressions of emotion

Shichuan Du; Yong Tao; Aleix M. Martinez

Significance Though people regularly recognize many distinct emotions, for the most part, research studies have been limited to six basic categories—happiness, surprise, sadness, anger, fear, and disgust; the reason for this is grounded in the assumption that only these six categories are differentially represented by our cognitive and social systems. The results reported herein propound otherwise, suggesting that a larger number of categories is used by humans. Understanding the different categories of facial expressions of emotion regularly used by us is essential to gain insights into human cognition and affect as well as for the design of computational models and perceptual interfaces. Past research on facial expressions of emotion has focused on the study of six basic categories—happiness, surprise, anger, sadness, fear, and disgust. However, many more facial expressions of emotion exist and are used regularly by humans. This paper describes an important group of expressions, which we call compound emotion categories. Compound emotions are those that can be constructed by combining basic component categories to create new ones. For instance, happily surprised and angrily surprised are two distinct compound emotion categories. The present work defines 21 distinct emotion categories. Sample images of their facial expressions were collected from 230 human subjects. A Facial Action Coding System analysis shows the production of these 21 categories is different but consistent with the subordinate categories they represent (e.g., a happily surprised expression combines muscle movements observed in happiness and surprised). We show that these differences are sufficient to distinguish between the 21 defined categories. We then use a computational model of face perception to demonstrate that most of these categories are also visually discriminable from one another.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2005

Where are linear feature extraction methods applicable

Aleix M. Martinez; Manli Zhu

A fundamental problem in computer vision and pattern recognition is to determine where and, most importantly, why a given technique is applicable. This is not only necessary because it helps us decide which techniques to apply at each given time. Knowing why current algorithms cannot be applied facilitates the design of new algorithms robust to such problems. In this paper, we report on a theoretical study that demonstrates where and why generalized eigen-based linear equations do not work. In particular, we show that when the smallest angle between the ith eigenvector given by the metric to be maximized and the first i eigenvectors given by the metric to be minimized is close to zero, our results are not guaranteed to be correct. Several properties of such models are also presented. For illustration, we concentrate on the classical applications of classification and feature extraction. We also show how we can use our findings to design more robust algorithms. We conclude with a discussion on the broader impacts of our results.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2008

Bayes Optimality in Linear Discriminant Analysis

Onur C. Hamsici; Aleix M. Martinez

We present an algorithm that provides the one-dimensional subspace, where the Bayes error is minimized for the C class problem with homoscedastic Gaussian distributions. Our main result shows that the set of possible one-dimensional spaces v, for which the order of the projected class means is identical, defines a convex region with associated convex Bayes error function g(v). This allows for the minimization of the error function using standard convex optimization algorithms. Our algorithm is then extended to the minimization of the Bayes error in the more general case of heteroscedastic distributions. This is done by means of an appropriate kernel mapping function. This result is further extended to obtain the d dimensional solution for any given d by iteratively applying our algorithm to the null space of the (d - l)-dimensional solution. We also show how this result can be used to improve upon the outcomes provided by existing algorithms and derive a low-computational cost, linear approximation. Extensive experimental validations are provided to demonstrate the use of these algorithms in classification, data analysis and visualization.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2010

Features versus Context: An Approach for Precise and Detailed Detection and Delineation of Faces and Facial Features

Liya Ding; Aleix M. Martinez

The appearance-based approach to face detection has seen great advances in the last several years. In this approach, we learn the image statistics describing the texture pattern (appearance) of the object class we want to detect, e.g., the face. However, this approach has had limited success in providing an accurate and detailed description of the internal facial features, i.e., eyes, brows, nose, and mouth. In general, this is due to the limited information carried by the learned statistical model. While the face template is relatively rich in texture, facial features (e.g., eyes, nose, and mouth) do not carry enough discriminative information to tell them apart from all possible background images. We resolve this problem by adding the context information of each facial feature in the design of the statistical model. In the proposed approach, the context information defines the image statistics most correlated with the surroundings of each facial component. This means that when we search for a face or facial feature, we look for those locations which most resemble the feature yet are most dissimilar to its context. This dissimilarity with the context features forces the detector to gravitate toward an accurate estimate of the position of the facial feature. Learning to discriminate between feature and context templates is difficult, however, because the context and the texture of the facial features vary widely under changing expression, pose, and illumination, and may even resemble one another. We address this problem with the use of subclass divisions. We derive two algorithms to automatically divide the training samples of each facial feature into a set of subclasses, each representing a distinct construction of the same facial component (e.g., closed versus open eyes) or its context (e.g., different hairstyles). The first algorithm is based on a discriminant analysis formulation. The second algorithm is an extension of the AdaBoost approach. We provide extensive experimental results using still images and video sequences for a total of 3,930 images. We show that the results are almost as good as those obtained with manual detection.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2011

Computing Smooth Time Trajectories for Camera and Deformable Shape in Structure from Motion with Occlusion

Paulo F. U. Gotardo; Aleix M. Martinez

We address the classical computer vision problems of rigid and nonrigid structure from motion (SFM) with occlusion. We assume that the columns of the input observation matrix W describe smooth 2D point trajectories over time. We then derive a family of efficient methods that estimate the column space of W using compact parameterizations in the Discrete Cosine Transform (DCT) domain. Our methods tolerate high percentages of missing data and incorporate new models for the smooth time trajectories of 2D-points, affine and weak-perspective cameras, and 3D deformable shape. We solve a rigid SFM problem by estimating the smooth time trajectory of a single camera moving around the structure of interest. By considering a weak-perspective camera model from the outset, we directly compute euclidean 3D shape reconstructions without requiring postprocessing steps such as euclidean upgrade and bundle adjustment. Our results on real SFM data sets with high percentages of missing data compared positively to those in the literature. In nonrigid SFM, we propose a novel 3D shape trajectory approach that solves for the deformable structure as the smooth time trajectory of a single point in a linear shape space. A key result shows that, compared to state-of-the-art algorithms, our nonrigid SFM method can better model complex articulated deformation with higher frequency DCT components while still maintaining the low-rank factorization constraint. Finally, we also offer an approach for nonrigid SFM when W is presented with missing data.

Journal of Machine Learning Research | 2012

A model of the perception of facial expressions of emotion by humans: research overview and perspectives

Aleix M. Martinez; Shichuan Du

In cognitive science and neuroscience, there have been two leading models describing how humans perceive and classify facial expressions of emotion-the continuous and the categorical model. The continuous model defines each facial expression of emotion as a feature vector in a face space. This model explains, for example, how expressions of emotion can be seen at different intensities. In contrast, the categorical model consists of C classifiers, each tuned to a specific emotion category. This model explains, among other findings, why the images in a morphing sequence between a happy and a surprise face are perceived as either happy or surprise but not something in between. While the continuous model has a more difficult time justifying this latter finding, the categorical model is not as good when it comes to explaining how expressions are recognized at different intensities or modes. Most importantly, both models have problems explaining how one can recognize combinations of emotion categories such as happily surprised versus angrily surprised versus surprise. To resolve these issues, in the past several years, we have worked on a revised model that justifies the results reported in the cognitive science and neuroscience literature. This model consists of C distinct continuous spaces. Multiple (compound) emotion categories can be recognized by linearly combining these C face spaces. The dimensions of these spaces are shown to be mostly configural. According to this model, the major task for the classification of facial expressions of emotion is precise, detailed detection of facial landmarks rather than recognition. We provide an overview of the literature justifying the model, show how the resulting model can be employed to build algorithms for the recognition of facial expression of emotion, and propose research directions in machine learning and computer vision researchers to keep pushing the state of the art in these areas. We also discuss how the model can aid in studies of human perception, social interactions and disorders.

computer vision and pattern recognition | 2009

Support Vector Machines in face recognition with occlusions

Hongjun Jia; Aleix M. Martinez

Support vector machines (SVM) are one of the most useful techniques in classification problems. One clear example is face recognition. However, SVM cannot be applied when the feature vectors defining our samples have missing entries. This is clearly the case in face recognition when occlusions are present in the training and/or testing sets. When k features are missing in a sample vector of class 1, these define an affine subspace of k dimensions. The goal of the SVM is to maximize the margin between the vectors of class 1 and class 2 on those dimensions with no missing elements and, at the same time, maximize the margin between the vectors in class 2 and the affine subspace of class 1. This second term of the SVM criterion will minimize the overlap between the classification hyperplane and the subspace of solutions in class 1, because we do not know which values in this subspace a test vector can take. The hyperplane minimizing this overlap is obviously the one parallel to the missing dimensions. However, this condition is too restrictive, because its solution will generally contradict that obtained when maximizing the margin of the visible data. To resolve this problem, we define a criterion which minimizes the probability of overlap. The resulting optimization problem can be solved efficiently and we show how the global minimum of the error term is guaranteed under mild conditions. We provide extensive experimental results, demonstrating the superiority of the proposed approach over the state of the art.

Image and Vision Computing | 2005

Robust motion estimation under varying illumination

Yeon-Ho Kim; Aleix M. Martinez; Avinash C. Kak

The optical-flow approach has emerged as a major technique for estimating scene and object motion in image sequences. However, the results obtained by most optical flow techniques are strongly affected by motion discontinuities and by large illumination changes. While there do exist many separate techniques for robust estimation of optical flow in the presence of motion discontinuities and for dealing with the problems caused by illumination variations, only a few integrated approaches have been proposed. However, most of these previously proposed integrated approaches use simple models of illumination variation; a common assumption being that illumination changes by either just a multiplicative factor or just an additive factor from frame to frame, but not both. Some other previously proposed integrated approaches are limited to specialized tasks such as image registration or change recovery. To remedy this shortcoming, this paper presents a new robust approach to general motion estimation in an integrated framework. Our approach deals simultaneously with motion discontinuities and large illumination variations. Our model of illumination variation is general, in the sense that it admits both multiplicative and additive effects.

Explore More