Nicolas Pinto
Massachusetts Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Nicolas Pinto.
PLOS Computational Biology | 2008
Nicolas Pinto; David Cox; James J. DiCarlo
Progress in understanding the brain mechanisms underlying vision requires the construction of computational models that not only emulate the brains anatomy and physiology, but ultimately match its performance on visual tasks. In recent years, “natural” images have become popular in the study of vision and have been used to show apparently impressive progress in building such models. Here, we challenge the use of uncontrolled “natural” images in guiding that progress. In particular, we show that a simple V1-like model—a neuroscientists “null” model, which should perform poorly at real-world visual object recognition tasks—outperforms state-of-the-art object recognition systems (biologically inspired and otherwise) on a standard, ostensibly natural image recognition test. As a counterpoint, we designed a “simpler” recognition test to better span the real-world variation in object pose, position, and scale, and we show that this test correctly exposes the inadequacy of the V1-like model. Taken together, these results demonstrate that tests based on uncontrolled natural images can be seriously misleading, potentially guiding progress in the wrong direction. Instead, we reexamine what it means for images to be natural and argue for a renewed focus on the core problem of object recognition—real-world image variation.
parallel computing | 2012
Andreas Klöckner; Nicolas Pinto; Yunsup Lee; Bryan Catanzaro; Paul Ivanov; Ahmed R. Fasih
High-performance computing has recently seen a surge of interest in heterogeneous systems, with an emphasis on modern Graphics Processing Units (GPUs). These devices offer tremendous potential for performance and efficiency in important large-scale applications of computational science. However, exploiting this potential can be challenging, as one must adapt to the specialized and rapidly evolving computing environment currently exhibited by GPUs. One way of addressing this challenge is to embrace better techniques and develop tools tailored to their needs. This article presents one simple technique, GPU run-time code generation (RTCG), along with PyCUDA and PyOpenCL, two open-source toolkits that supports this technique. In introducing PyCUDA and PyOpenCL, this article proposes the combination of a dynamic, high-level scripting language with the massive performance of a GPU as a compelling two-tiered computing platform, potentially offering significant performance and productivity advantages over conventional single-tier, static systems. The concept of RTCG is simple and easily implemented using existing, robust infrastructure. Nonetheless it is powerful enough to support (and encourage) the creation of custom application-specific tools by its users. The premise of the paper is illustrated by a wide range of examples where the technique has been applied with considerable success.
PLOS Computational Biology | 2014
Charles F. Cadieu; Ha Hong; Daniel Yamins; Nicolas Pinto; Diego Ardila; Ethan A. Solomon; Najib J. Majaj; James J. DiCarlo
The primate visual system achieves remarkable visual object recognition performance even in brief presentations, and under changes to object exemplar, geometric transformations, and background variation (a.k.a. core visual object recognition). This remarkable performance is mediated by the representation formed in inferior temporal (IT) cortex. In parallel, recent advances in machine learning have led to ever higher performing models of object recognition using artificial deep neural networks (DNNs). It remains unclear, however, whether the representational performance of DNNs rivals that of the brain. To accurately produce such a comparison, a major difficulty has been a unifying metric that accounts for experimental limitations, such as the amount of noise, the number of neural recording sites, and the number of trials, and computational limitations, such as the complexity of the decoding classifier and the number of classifier training examples. In this work, we perform a direct comparison that corrects for these experimental limitations and computational considerations. As part of our methodology, we propose an extension of “kernel analysis” that measures the generalization accuracy as a function of representational complexity. Our evaluations show that, unlike previous bio-inspired models, the latest DNNs rival the representational performance of IT cortex on this visual object recognition task. Furthermore, we show that models that perform well on measures of representational performance also perform well on measures of representational similarity to IT, and on measures of predicting individual IT multi-unit responses. Whether these DNNs rely on computational mechanisms similar to the primate visual system is yet to be determined, but, unlike all previous bio-inspired models, that possibility cannot be ruled out merely on representational performance grounds.
PLOS Computational Biology | 2009
Nicolas Pinto; David Doukhan; James J. DiCarlo; David Cox
While many models of biological object recognition share a common set of “broad-stroke” properties, the performance of any one model depends strongly on the choice of parameters in a particular instantiation of that model—e.g., the number of units per layer, the size of pooling kernels, exponents in normalization operations, etc. Since the number of such parameters (explicit or implicit) is typically large and the computational cost of evaluating one particular parameter set is high, the space of possible model instantiations goes largely unexplored. Thus, when a model fails to approach the abilities of biological visual systems, we are left uncertain whether this failure is because we are missing a fundamental idea or because the correct “parts” have not been tuned correctly, assembled at sufficient scale, or provided with enough training. Here, we present a high-throughput approach to the exploration of such parameter sets, leveraging recent advances in stream processing hardware (high-end NVIDIA graphic cards and the PlayStation 3s IBM Cell Processor). In analogy to high-throughput screening approaches in molecular biology and genetics, we explored thousands of potential network architectures and parameter instantiations, screening those that show promising object recognition performance for further analysis. We show that this approach can yield significant, reproducible gains in performance across an array of basic object recognition tasks, consistently outperforming a variety of state-of-the-art purpose-built vision systems from the literature. As the scale of available computational power continues to expand, we argue that this approach has the potential to greatly accelerate progress in both artificial vision and our understanding of the computational underpinning of biological vision.
Face and Gesture 2011 | 2011
David Cox; Nicolas Pinto
Many modern computer vision algorithms are built atop of a set of low-level feature operators (such as SIFT [1], [2]; HOG [3], [4]; or LBP [5], [6]) that transform raw pixel values into a representation better suited to subsequent processing and classification. While the choice of feature representation is often not central to the logic of a given algorithm, the quality of the feature representation can have critically important implications for performance. Here, we demonstrate a large-scale feature search approach to generating new, more powerful feature representations in which a multitude of complex, nonlinear, multilayer neuromorphic feature representations are randomly generated and screened to find those best suited for the task at hand. In particular, we show that a brute-force search can generate representations that, in combination with standard machine learning blending techniques, achieve state-of-the-art performance on the Labeled Faces in the Wild (LFW) [7] unconstrained face recognition challenge set. These representations outperform previous state-of-the-art approaches, in spite of requiring less training data and using a conceptually simpler machine learning backend. We argue that such large-scale-search-derived feature sets can play a synergistic role with other computer vision approaches by providing a richer base of features with which to work.
workshop on applications of computer vision | 2011
Nicolas Pinto; Youssef Barhomi; David Cox; James J. DiCarlo
Tolerance (“invariance”) to identity-preserving image variation (e.g. variation in position, scale, pose, illumination) is a fundamental problem that any visual object recognition system, biological or engineered, must solve. While standard natural image database benchmarks are useful for guiding progress in computer vision, they can fail to probe the ability of a recognition system to solve the invariance problem [23, 24, 25]. Thus, to understand which computational approaches are making progress on solving the invariance problem, we compared and contrasted a variety of state-of-the-art visual representations using synthetic recognition tasks designed to systematically probe invari-ance. We successfully re-implemented a variety of state-of-the-art visual representations and confirmed their published performance on a natural image benchmark. We here report that most of these representations perform poorly on invariant recognition, but that one representation [21] shows significant performance gains over two baseline representations. We also show how this approach can more deeply illuminate the strengths and weaknesses of different visual representations and thus guide progress on invariant object recognition.
british machine vision conference | 2012
Giovani Chiachia; Nicolas Pinto; William Robson Schwartz; Anderson Rocha; Alexandre X. Falcão; David Cox
While significant strides have been made in the recognition of faces under controlled viewing conditions, face recognition “in the wild” remains a challenging unsolved problem [12, 13, 21]. Interestingly, while humans are generally excellent at identifying familiar individuals under such conditions, their performance is significantly worse with unfamiliar individuals [5] and groups [19], leading to the idea that brain may have enhanced or specialized representations of familiar individuals [6]. Inspired by these observations, we explored the use of a number of subspace analysis techniques, applied to various visual representations, to generate person-specific subspaces of “familiar” individuals for face identification. In particular, we introduce a person-specific application of partial least squares (PS-PLS) to generate per-individual subspaces, and show that operating in these subspaces yields state-of-the-art performance on the challenging PubFig83 familiar face identification benchmark. The results underscore the potential importance of incorporating a notion of familiarity into face recognition systems.
bioinspired models of network, information, and computing systems | 2010
Nicolas Pinto; David Cox
A key challenge in building face recognition systems — biologically-inspired or otherwise — is evaluating performance. While much of face recognition research has traditionally used posed photographs for evaluation, recent efforts have emerged to build more naturalistic, unconstrained test sets by collecting large numbers of face images from the internet (e.g. the “Labeled Faces in the Wild”(LFW) test set [1]). While such efforts represent a large step forward in the direction of realism, the nature of posed photographs from the internet arguably represents an incomplete sampling of the range of variation in view, lighting, etc. found in the real world. Here, we evaluate a family of large-scale biologically-inspired vision algorithms that has previously proven to perform well on a variety of object and face recognition test sets [2], and show that members of this family perform at a level of performance that is comparable with current state-of-the-art approaches on the LFW challenge. As a counterpoint to internet-photo based approaches, we use synthetic (rendered) face images where the amount of view variation is controllable and known by design. We show that while there is gross agreement between the LFW benchmark and synthetic benchmarks, the synthetic benchmarks reveal a substantially greater degree of tolerance to view variation than is apparent from the LFW benchmark in models containing deeper hierarchies. Furthermore, such an approach yields important insights into which axes of variation are most challenging. These results suggest that parametric synthetic benchmarks can play an important role in guiding the progress of biologically-inspired vision systems.
Image and Vision Computing | 2012
Nicolas Pinto; David Cox
Many modern computer vision algorithms are built atop of a set of low-level feature operators (such as SIFT [23,24]; HOG [8,3]; or LBP [1,2]) that transform raw pixel values into a representation better suited to subsequent processing and classification. While the choice of feature representation is often not central to the logic of a given algorithm, the quality of the feature representation can have critically important implications for performance. Here, we demonstrate a large-scale feature search approach to generating new, more powerful feature representations in which a multitude of complex, nonlinear, multilayer neuromorphic feature representations are randomly generated and screened to find those best suited for the task at hand. In particular, we show that a brute-force search can generate representations that, in combination with standard machine learning blending techniques, achieve state-of-the-art performance on the Labeled Faces in the Wild (LFW) [19] unconstrained face recognition challenge set. These representations outperform previous state-of-the-art approaches, in spite of requiring less training data and using a conceptually simpler machine learning backend. We argue that such large-scale-search-derived feature sets can play a synergistic role with other computer vision approaches by providing a richer base of features with which to work.
arXiv: Software Engineering | 2013
Andreas Klöckner; Nicolas Pinto; Bryan Catanzaro; Yunsup Lee; Paul Ivanov; Ahmed R. Fasih
Publisher Summary A common lament in the field of scientific computing concerns the ever-widening gap between hypothetical machine capabilities and the effort an individual programmer is able to spend to move a computational application towards exploiting a machine to the full extent of its capability. More sophisticated tools, compilers, and libraries are generally hoped to level this field, enabling users to achieve good results even with modest investment. PyCUDA is a contribution to the tools for graphics processing unit (GPU) computing. PyCUDA has a two-fold aim. First, it aims to simplify the usage of existing basic concepts of CUDA C. Importantly, it does not attempt to change or reinvent the basic notions of GPU programming, but instead, as a foundation for further tools, just exposes them as is through a carefully engineered interface. Key features of this interface include generous use of sensible defaults, automatic error checking, and automatic resource management. Second, and strictly on top of the first, basic layer, PyCUDA provides abstractions that support a number of very common usage patterns involving a GPU equivalent of NumPy arrays. Although PyCUDA gives the user access to a compiled language (CUDA C), it attempts to avoid the development iteration penalty commonly associated with compiled languages, instead retaining the satisfaction and immediacy of scripting. PyCUDA and PyOpenCL, as open-source projects, thrive on user feedback, particularly feedback regarding limitations, bugs, or missing features that users encounter.