Is this you? Create Your Porfile

Steve Branson

California Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Steve Branson is active.

Explore More

Publication

Featured researches published by Steve Branson.

european conference on computer vision | 2010

Visual recognition with humans in the loop

Steve Branson; Catherine Wah; Florian Schroff; Boris Babenko; Peter Welinder; Pietro Perona; Serge J. Belongie

We present an interactive, hybrid human-computer method for object classification. The method applies to classes of objects that are recognizable by people with appropriate expertise (e.g., animal species or airplane model), but not (in general) by people without such expertise. It can be seen as a visual version of the 20 questions game, where questions based on simple visual attributes are posed interactively. The goal is to identify the true class while minimizing the number of questions asked, using the visual content of the image. We introduce a general framework for incorporating almost any off-the-shelf multi-class object recognition algorithm into the visual 20 questions game, and provide methodologies to account for imperfect user responses and unreliable computer vision algorithms. We evaluate our methods on Birds-200, a difficult dataset of 200 tightly-related bird species, and on the Animals With Attributes dataset. Our results demonstrate that incorporating user input drives up recognition accuracy to levels that are good enough for practical applications, while at the same time, computer vision reduces the amount of human interaction required.

international conference on computer vision | 2011

Multiclass recognition and part localization with humans in the loop

Catherine Wah; Steve Branson; Pietro Perona; Serge J. Belongie

We propose a visual recognition system that is designed for fine-grained visual categorization. The system is composed of a machine and a human user. The user, who is unable to carry out the recognition task by himself, is interactively asked to provide two heterogeneous forms of information: clicking on object parts and answering binary questions. The machine intelligently selects the most informative question to pose to the user in order to identify the objects class as quickly as possible. By leveraging computer vision and analyzing the user responses, the overall amount of human effort required, measured in seconds, is minimized. We demonstrate promising results on a challenging dataset of uncropped images, achieving a significant average reduction in human effort over previous methods.

international conference on computer vision | 2011

Strong supervision from weak annotation: Interactive training of deformable part models

Steve Branson; Pietro Perona; Serge J. Belongie

We propose a framework for large scale learning and annotation of structured models. The system interleaves interactive labeling (where the current model is used to semi-automate the labeling of a new example) and online learning (where a newly labeled example is used to update the current model parameters). This framework is scalable to large datasets and complex image models and is shown to have excellent theoretical and practical properties in terms of train time, optimality guarantees, and bounds on the amount of annotation effort per image. We apply this framework to part-based detection, and introduce a novel algorithm for interactive labeling of deformable part models. The labeling tool updates and displays in real-time the maximum likelihood location of all parts as the user clicks and drags the location of one or more parts. We demonstrate that the system can be used to efficiently and robustly train part and pose detectors on the CUB Birds-200-a challenging dataset of birds in unconstrained pose and environment.

international conference on computer vision | 2009

Similarity metrics for categorization: From monolithic to category specific

Boris Babenko; Steve Branson; Serge J. Belongie

Similarity metrics that are learned from labeled training data can be advantageous in terms of performance and/or efficiency. These learned metrics can then be used in conjunction with a nearest neighbor classifier, or can be plugged in as kernels to an SVM. For the task of categorization two scenarios have thus far been explored. The first is to train a single “monolithic” similarity metric that is then used for all examples. The other is to train a metric for each category in a 1-vs-all manner. While the former approach seems to be at a disadvantage in terms of performance, the latter is not practical for large numbers of categories. In this paper we explore the space in between these two extremes. We present an algorithm that learns a few similarity metrics, while simultaneously grouping categories together and assigning one of these metrics to each group. We present promising results and show how the learned metrics generalize to novel categories.

International Journal of Computer Vision | 2014

The Ignorant Led by the Blind: A Hybrid Human---Machine Vision System for Fine-Grained Categorization

Steve Branson; Grant Van Horn; Catherine Wah; Pietro Perona; Serge J. Belongie

We present a visual recognition system for fine-grained visual categorization. The system is composed of a human and a machine working together and combines the complementary strengths of computer vision algorithms and (non-expert) human users. The human users provide two heterogeneous forms of information object part clicks and answers to multiple choice questions. The machine intelligently selects the most informative question to pose to the user in order to identify the object class as quickly as possible. By leveraging computer vision and analyzing the user responses, the overall amount of human effort required, measured in seconds, is minimized. Our formalism shows how to incorporate many different types of computer vision algorithms into a human-in-the-loop framework, including standard multiclass methods, part-based methods, and localized multiclass and attribute methods. We explore our ideas by building a field guide for bird identification. The experimental results demonstrate the strength of combining ignorant humans with poor-sighted machines the hybrid system achieves quick and accurate bird identification on a dataset containing 200 bird species.

computer vision and pattern recognition | 2014

Similarity Comparisons for Interactive Fine-Grained Categorization

Catherine Wah; Grant Van Horn; Steve Branson; Subhransu Maji; Pietro Perona; Serge J. Belongie

Current human-in-the-loop fine-grained visual categorization systems depend on a predefined vocabulary of attributes and parts, usually determined by experts. In this work, we move away from that expert-driven and attribute-centric paradigm and present a novel interactive classification system that incorporates computer vision and perceptual similarity metrics in a unified framework. At test time, users are asked to judge relative similarity between a query image and various sets of images, these general queries do not require expert-defined terminology and are applicable to other domains and basic-level categories, enabling a flexible, efficient, and scalable system for fine-grained categorization with humans in the loop. Our system outperforms existing state-of-the-art systems for relevance feedback-based image retrieval as well as interactive classification, resulting in a reduction of up to 43% in the average number of questions needed to correctly classify an image.

computer vision and pattern recognition | 2015

Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection

Grant Van Horn; Steve Branson; Ryan Farrell; Scott Haber; Jessie H. Barry; Panos Ipeirotis; Pietro Perona; Serge J. Belongie

We introduce tools and methodologies to collect high quality, large scale fine-grained computer vision datasets using citizen scientists - crowd annotators who are passionate and knowledgeable about specific domains such as birds or airplanes. We worked with citizen scientists and domain experts to collect NABirds, a new high quality dataset containing 48,562 images of North American birds with 555 categories, part annotations and bounding boxes. We find that citizen scientists are significantly more accurate than Mechanical Turkers at zero cost. We worked with bird experts to measure the quality of popular datasets like CUB-200-2011 and ImageNet and found class label error rates of at least 4%. Nevertheless, we found that learning algorithms are surprisingly robust to annotation errors and this level of training data corruption can lead to an acceptably small increase in test error if the training set has sufficient size. At the same time, we found that an expert-curated high quality test set like NABirds is necessary to accurately measure the performance of fine-grained computer vision systems. We used NABirds to train a publicly available bird recognition service deployed on the web site of the Cornell Lab of Ornithology.

computer vision and pattern recognition | 2013

Efficient Large-Scale Structured Learning

Steve Branson; Oscar Beijbom; Serge J. Belongie

We introduce an algorithm, SVM-IS, for structured SVM learning that is computationally scalable to very large datasets and complex structural representations. We show that structured learning is at least as fast-and often much faster-than methods based on binary classification for problems such as deformable part models, object detection, and multiclass classification, while achieving accuracies that are at least as good. Our method allows problem-specific structural knowledge to be exploited for faster optimization by integrating with a user-defined importance sampling function. We demonstrate fast train times on two challenging large scale datasets for two very different problems: Image Net for multiclass classification and CUB-200-2011 for deformable part model training. Our method is shown to be 10-50 times faster than SVMstruct for cost-sensitive multiclass classification while being about as fast as the fastest 1-vs-all methods for multiclass classification. For deformable part model training, it is shown to be 50-1000 times faster than methods based on SVMstruct, mining hard negatives, and Pegasos-style stochastic gradient descent. Source code of our method is publicly available.

european conference on computer vision | 2014

Detecting Social Actions of Fruit Flies

Eyrun Eyjolfsdottir; Steve Branson; Xavier P. Burgos-Artizzu; Eric D Hoopfer; Jonathan Schor; David J. Anderson; Pietro Perona

We describe a system that tracks pairs of fruit flies and automatically detects and classifies their actions. We compare experimentally the value of a frame-level feature representation with the more elaborate notion of ‘bout features’ that capture the structure within actions. Similarly, we compare a simple sliding window classifier architecture with a more sophisticated structured output architecture, and find that window based detectors outperform the much slower structured counterparts, and approach human performance. In addition we test our top performing detector on the CRIM13 mouse dataset, finding that it matches the performance of the best published method. Our Fly-vs-Fly dataset contains 22 hours of video showing pairs of fruit flies engaging in 10 social interactions in three different contexts; it is fully annotated by experts, and published with articulated pose trajectory features.

computer vision and pattern recognition | 2017

Lean Crowdsourcing: Combining Humans and Machines in an Online System

Steve Branson; Grant Van Horn; Pietro Perona

We introduce a method to greatly reduce the amount of redundant annotations required when crowdsourcing annotations such as bounding boxes, parts, and class labels. For example, if two Mechanical Turkers happen to click on the same pixel location when annotating a part in a given image–an event that is very unlikely to occur by random chance–, it is a strong indication that the location is correct. A similar type of confidence can be obtained if a single Turker happened to agree with a computer vision estimate. We thus incrementally collect a variable number of worker annotations per image based on online estimates of confidence. This is done using a sequential estimation of risk over a probabilistic model that combines worker skill, image difficulty, and an incrementally trained computer vision model. We develop specialized models and algorithms for binary annotation, part keypoint annotation, and sets of bounding box annotations. We show that our method can reduce annotation time by a factor of 4-11 for binary filtering of websearch results, 2-4 for annotation of boxes of pedestrians in images, while in many cases also reducing annotation error. We will make an end-to-end version of our system publicly available.

Explore More