Nicholas J. Butko
University of California, San Diego
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Nicholas J. Butko.
computer vision and pattern recognition | 2009
Nicholas J. Butko; Javier R. Movellan
Recent years have seen the development of fast and accurate algorithms for detecting objects in images. However, as the size of the scene grows, so do the running-times of these algorithms. If a 128×102 pixel image requires 20 ms to process, searching for objects in a 1280×1024 image will take 2 s. This is unsuitable under real-time operating constraints: by the time a frame has been processed, the object may have moved. An analogous problem occurs when controlling robot camera that need to scan scenes in search of target objects. In this paper, we consider a method for improving the run-time of general-purpose object-detection algorithms. Our method is based on a model of visual search in humans, which schedules eye fixations to maximize the long-term information accrued about the location of the target of interest. The approach can be used to drive robot cameras that physically scan scenes or to improve the scanning speed for very large high resolution images. We consider the latter application in this work by simulating a “digital fovea” and sequentially placing it in various regions of an image in a way that maximizes the expected information gain. We evaluate the approach using the OpenCV version of the Viola-Jones face detector. After accounting for all computational overhead introduced by the fixation controller, the approach doubles the speed of the standard Viola-Jones detector at little cost in accuracy.
international conference on robotics and automation | 2008
Nicholas J. Butko; Lingyun Zhang; Garrison W. Cottrell; Javier R. Movellan
Recent years have seen an explosion of research on the computational modeling of human visual attention in task free conditions, i.e., given an image predict where humans are likely to look. This area of research could potentially provide general purpose mechanisms for robots to orient their cameras. One difficulty is that most current models of visual saliency are computationally very expensive and not suited to real time implementations needed for robotic applications. Here we propose a fast approximation to a Bayesian model of visual saliency recently proposed in the literature. The approximation can run in real time on current computers at very little computational cost, leaving plenty of CPU cycles for other tasks. We empirically evaluate the saliency model in the domain of controlling saccades of a camera in social robotics situations. The goal was to orient a camera as quickly as possible toward human faces. We found that this simple general purpose saliency model doubled the success rate of the camera: it captured images of people 70% of the time, when compared to a 35% success rate when the camera was controlled using an open-loop scheme. After 3 saccades (camera movements), the robot was 96% likely to capture at least one person. The results suggest that visual saliency models may provide a useful front end for camera control in robotics applications.
IEEE Transactions on Autonomous Mental Development | 2010
Nicholas J. Butko; Javier R. Movellan
Recently, infomax methods of optimal control have begun to reshape how we think about active information gathering. We show how such methods can be used to formulate the problem of choosing where to look. We show how an optimal eye movement controller can be learned from subjective experiences of information gathering, and we explore in simulation properties of the optimal controller. This controller outperforms other eye movement strategies proposed in the literature. The learned eye movement strategies are tailored to the specific visual system of the learner-we show that agents with different kinds of eyes should follow different eye movement strategies. Then we use these insights to build an autonomous computer program that follows this approach and learns to search for faces in images faster than current state-of-the-art techniques. The context of these results is search in static scenes, but the approach extends easily, and gives further efficiency gains, to dynamic tracking tasks. A limitation of infomax methods is that they require probabilistic models of uncertainty of the sensory system, the motor system, and the external world. In the final section of this paper, we propose future avenues of research by which autonomous physical agents may use developmental experience to subjectively characterize the uncertainties they face.
international conference on development and learning | 2008
Nicholas J. Butko; Javier R. Movellan
Modeling eye-movements during search is important for building intelligent robotic vision systems, and for understanding how humans select relevant information and structure behavior in real time. Previous models of visual search (VS) rely on the idea of ldquosaliency mapsrdquo which indicate likely locations for targets of interest. In these models the eyes move to locations with maximum saliency. This approach has several drawbacks: (1) It assumes that oculomotor control is a greedy process, i.e., every eye movement is planned as if no further eye movements would be possible after it. (2) It does not account for temporal dynamics and how information is integrated as over time. (3) It does not provide a formal basis to understand how optimal search should vary as a function of the operating characteristics of the visual system. To address these limitations, we reformulate the problem of VS as an Information-gathering Partially Observable Markov Decision Process (I-POMDP). We find that the optimal control law depends heavily on the Foveal-Peripheral Operating Characteristic (FPOC) of the visual system.
international conference on development and learning | 2009
Tingfan Wu; Nicholas J. Butko; Paul Ruvulo; Marian Stewart Bartlett; Javier R. Movellan
This paper explores the process of self-guided learning of realistic facial expression production by a robotic head with 31 degrees of freedom. Facial motor parameters were learned using feedback from real-time facial expression recognition from video. The experiments show that the mapping of servos to expressions was learned in under one-hour of training time. We discuss how our work may help illuminate the computational study of how infants learn to make facial expressions.
systems man and cybernetics | 2012
Tingfan Wu; Nicholas J. Butko; Paul Ruvolo; Jacob Whitehill; Marian Stewart Bartlett; Javier R. Movellan
In expression recognition and many other computer vision applications, the recognition performance is greatly improved by adding a layer of nonlinear texture filters between the raw input pixels and the classifier. The function of this layer is typically known as feature extraction. Popular filter types for this layer are Gabor energy filters (GEFs) and local binary patterns (LBPs). Recent work [1] suggests that adding a second layer of nonlinear filters on top of the first layer may be beneficial. However, it is unclear what is the best architecture of layers and selection of filters. In this paper, we present a thorough empirical analysis of the performance of single-layer and dual-layer texture-based approaches for action unit recognition. For the single hidden layer case, GEFs perform consistently better than LBPs, which may be due to their robustness to jitter and illumination noise as well as to their ability to encode texture at multiple resolutions. For dual-layer case, we confirm that, while small, the benefit of adding this second layer is reliable and consistent across data sets. Interestingly for this second layer, LBPs appear to perform better than GEFs.
Face and Gesture 2011 | 2011
Tingfan Wu; Nicholas J. Butko; Paul Ruvolo; Jacob Whitehill; Marian Stewart Bartlett; Javier R. Movellan
We explore how CERT [15], a computer expression recognition toolbox trained on a large dataset of spontaneous facial expressions (FFD07), generalizes to a new, previously unseen dataset (FERA). The experiment was unique in that the authors had no access to the test labels, which were guarded as part of the FERA challenge. We show that without any training or special adaptation to the new database, CERT performs better than a baseline method trained exclusively on that database. Best results are achieved by retraining CERT with a combination of old and new data. We also found that the FERA dataset may be too small and idiosyncratic to generalize to other datasets. Training on FERA alone produced good results on FERA but very poor results on FFD07. We reflect on the importance of challenges like this for the future of the field, and discuss suggestions for standardization of future challenges.
Face and Gesture 2011 | 2011
Gwen Littlewort; Jacob Whitehill; Tingfan Wu; Nicholas J. Butko; Paul Ruvolo; Javier R. Movellan; Marian Stewart Bartlett
This paper assesses the performance of measures of facial expression dynamics derived from the Computer Expression Recognition Toolbox (CERT) for classifying emotions in the Facial Expression Recognition and Analysis (FERA) Challenge. The CERT system automatically estimates facial action intensity and head position using learned appearance-based models on single frames of video. CERT outputs were used to derive a representation of the intensity and motion in each video, consisting of the extremes of displacement, velocity and acceleration. Using this representation, emotion detectors were trained on the FERA training examples. Experiments on the released portion of the FERA dataset are presented, as well as results on the blind test. No consideration of subject identity was taken into account in the blind test. The F1 scores were well above the baseline criterion for success.
advanced concepts for intelligent vision systems | 2006
Ferid Bajramovic; Frank Mattern; Nicholas J. Butko; Joachim Denzler
The nearest neighbor (NN) classifier is well suited for generic object recognition. However, it requires storing the complete training data, and classification time is linear in the amount of data. There are several approaches to improve runtime and/or memory requirements of nearest neighbor methods: Thinning methods select and store only part of the training data for the classifier. Efficient query structures reduce query times. In this paper, we present an experimental comparison and analysis of such methods using the ETH-80 database. We evaluate the following algorithms. Thinning: condensed nearest neighbor, reduced nearest neighbor, Barams algorithm, the Baram-RNN hybrid algorithm, Gabriel and GSASH thinning. Query structures: kd-tree and approximate nearest neighbor. For the first four thinning algorithms, we also present an extension to k-NN which allows tuning the trade-off between data reduction and classifier degradation. The experiments show that most of the above methods are well suited for generic object recognition.
Neurocomputing | 2007
Nicholas J. Butko; Jochen Triesch
Intrinsic plasticity (IP) refers to a neurons ability to regulate its firing activity by adapting its intrinsic excitability. Previously, we showed that model neurons combining a model of IP based on information theory with Hebbian synaptic plasticity can adapt their weight vector to discover heavy-tailed directions in the input space. In this paper we show how a network of such units can solve a standard non-linear independent component analysis (ICA) problem. We also present a model for the formation of maps of oriented receptive fields in primary visual cortex and compare our results with those from ICA. Together, our results indicate that intrinsic plasticity that tries to locally maximize information transmission at the level of individual neurons may play an important role for the learning of efficient sensory representations in the cortex.