Bryan C. Russell | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Bryan C. Russell is active.

Explore More

Publication

Featured researches published by Bryan C. Russell.

International Journal of Computer Vision | 2008

LabelMe: A Database and Web-Based Tool for Image Annotation

Bryan C. Russell; Antonio Torralba; Kevin P. Murphy; William T. Freeman

Abstract We seek to build a large collection of images with ground truth labels to be used for object detection and recognition research. Such data is useful for supervised learning and quantitative evaluation. To achieve this, we developed a web-based tool that allows easy image annotation and instant sharing of such annotations. Using this annotation tool, we have collected a large dataset that spans many object categories, often containing multiple instances over a wide variety of images. We quantify the contents of the dataset and compare against existing state of the art datasets used for object recognition and detection. Also, we show how to extend the dataset to automatically enhance object labels with WordNet, discover object parts, recover a depth ordering of objects in a scene, and increase the number of labels using minimal user supervision and images from the web.

computer vision and pattern recognition | 2006

Using Multiple Segmentations to Discover Objects and their Extent in Image Collections

Bryan C. Russell; William T. Freeman; Alexei A. Efros; Josef Sivic; Andrew Zisserman

Given a large dataset of images, we seek to automatically determine the visually similar object and scene classes together with their image segmentation. To achieve this we combine two ideas: (i) that a set of segmented objects can be partitioned into visual object classes using topic discovery models from statistical text analysis; and (ii) that visual object classes can be used to assess the accuracy of a segmentation. To tie these ideas together we compute multiple segmentations of each image and then: (i) learn the object classes; and (ii) choose the correct segmentations. We demonstrate that such an algorithm succeeds in automatically discovering many familiar objects in a variety of image datasets, including those from Caltech, MSRC and LabelMe.

computer vision and pattern recognition | 2008

Unsupervised discovery of visual object class hierarchies

Josef Sivic; Bryan C. Russell; Andrew Zisserman; William T. Freeman; Alexei A. Efros

Objects in the world can be arranged into a hierarchy based on their semantic meaning (e.g. organism - animal - feline - cat). What about defining a hierarchy based on the visual appearance of objects? This paper investigates ways to automatically discover a hierarchical structure for the visual world from a collection of unlabeled images. Previous approaches for unsupervised object and scene discovery focused on partitioning the visual data into a set of non-overlapping classes of equal granularity. In this work, we propose to group visual objects using a multi-layer hierarchy tree that is based on common visual elements. This is achieved by adapting to the visual domain the generative hierarchical latent Dirichlet allocation (hLDA) model previously used for unsupervised discovery of topic hierarchies in text. Images are modeled using quantized local image regions as analogues to words in text. Employing the multiple segmentation framework of Russell et al. [22], we show that meaningful object hierarchies, together with object segmentations, can be automatically learned from unlabeled and unsegmented image collections without supervision. We demonstrate improved object classification and localization performance using hLDA over the previous non-hierarchical method on the MSRC dataset [33].

Lecture Notes in Computer Science | 2006

Dataset Issues in Object Recognition

Jean Ponce; Tamara L. Berg; Mark Everingham; David A. Forsyth; Martial Hebert; Svetlana Lazebnik; Marcin Marszalek; Cordelia Schmid; Bryan C. Russell; Antonio Torralba; Christopher K. I. Williams; Jianguo Zhang; Andrew Zisserman

Appropriate datasets are required at all stages of object recognition research, including learning visual models of object and scene categories, detecting and localizing instances of these models in images, and evaluating the performance of recognition algorithms. Current datasets are lacking in several respects, and this paper discusses some of the lessons learned from existing efforts, as well as innovative ways to obtain very large and diverse annotated datasets. It also suggests a few criteria for gathering future datasets.

international conference on computer vision | 2009

LabelMe video: Building a video database with human annotations

Jenny Yuen; Bryan C. Russell; Ce Liu; Antonio Torralba

Currently, video analysis algorithms suffer from lack of information regarding the objects present, their interactions, as well as from missing comprehensive annotated video databases for benchmarking. We designed an online and openly accessible video annotation system that allows anyone with a browser and internet access to efficiently annotate object category, shape, motion, and activity information in real-world videos. The annotations are also complemented with knowledge from static image databases to infer occlusion and depth information. Using this system, we have built a scalable video database composed of diverse video samples and paired with human-guided annotations. We complement this paper demonstrating potential uses of this database by studying motion statistics as well as cause-effect motion relationships between objects.

ACM Transactions on Graphics | 2014

Painting-to-3D model alignment via discriminative visual elements

Mathieu Aubry; Bryan C. Russell; Josef Sivic

This article describes a technique that can reliably align arbitrary 2D depictions of an architectural site, including drawings, paintings, and historical photographs, with a 3D model of the site. This is a tremendously difficult task, as the appearance and scene structure in the 2D depictions can be very different from the appearance and geometry of the 3D model, for example, due to the specific rendering style, drawing error, age, lighting, or change of seasons. In addition, we face a hard search problem: the number of possible alignments of the painting to a large 3D model, such as a partial reconstruction of a city, is huge. To address these issues, we develop a new compact representation of complex 3D scenes. The 3D model of the scene is represented by a small set of discriminative visual elements that are automatically learned from rendered views. Similar to object detection, the set of visual elements, as well as the weights of individual features for each element, are learned in a discriminative fashion. We show that the learned visual elements are reliably matched in 2D depictions of the scene despite large variations in rendering style (e.g., watercolor, sketch, historical photograph) and structural changes (e.g., missing scene parts, large occluders) of the scene. We demonstrate an application of the proposed approach to automatic rephotography to find an approximate viewpoint of historical paintings and photographs with respect to a 3D model of the site. The proposed alignment procedure is validated via a human user study on a new database of paintings and sketches spanning several sites. The results demonstrate that our algorithm produces significantly better alignments than several baseline methods.

Proceedings of the IEEE | 2010

LabelMe: Online Image Annotation and Applications

Antonio Torralba; Bryan C. Russell; Jenny Yuen

Central to the development of computer vision systems is the collection and use of annotated images spanning our visual world. Annotations may include information about the identity, spatial extent, and viewpoint of the objects present in a depicted scene. Such a database is useful for the training and evaluation of computer vision systems. Motivated by the availability of images on the Internet, we introduced a web-based annotation tool that allows online users to label objects and their spatial extent in images. To date, we have collected over 400 000 annotations that span a variety of different scene and object classes. In this paper, we show the contents of the database, its growth over time, and statistics of its usage. In addition, we explore and survey applications of the database in the areas of computer vision and computer graphics. Particularly, we show how to extract the real-world 3-D coordinates of images in a variety of scenes using only the user-provided object annotations. The output 3-D information is comparable to the quality produced by a laser range scanner. We also characterize the space of the images in the database by analyzing 1) statistics of the co-occurrence of large objects in the images and 2) the spatial layout of the labeled images.

computer vision and pattern recognition | 2004

Efficient graphical models for processing images

Marshall F. Tappen; Bryan C. Russell; William T. Freeman

Graphical models are powerful tools for processing images. However, the large dimensionality of even local image data poses a difficulty. Representing the range of possible graphical model node variables with discrete states leads to an overwhelmingly large number of states for the model, often making both exact and approximate inference computationally intractable. We propose a representation that allows a small number of discrete states to represent the large number of possible image values at each pixel or local image patch. Each node in the graph represents the best regression function, chosen from a set of candidate functions, for estimating the unobserved image pixels from the observed samples. This permits a small number of discrete states to summarize the range of possible image values at each point in the image. Belief propagation is then used to find the best regressor to use at each point. To demonstrate the usefulness of this technique, we apply it to two problems: super-resolution and color demosaicing. In both cases, we find our method compares well against other techniques for these problems.

international conference on computer vision | 2015

Understanding Deep Features with Computer-Generated Imagery

Mathieu Aubry; Bryan C. Russell

We introduce an approach for analyzing the variation of features generated by convolutional neural networks (CNNs) trained on large image datasets with respect to scene factors that occur in natural images. Such factors may include object style, 3D viewpoint, color, and scene lighting configuration. Our approach analyzes CNN feature responses with respect to different scene factors by controlling for them via rendering using a large database of 3D CAD models. The rendered images are presented to a trained CNN and responses for different layers are studied with respect to the input scene factors. We perform a linear decomposition of the responses based on knowledge of the input scene factors and analyze the resulting components. In particular, we quantify their relative importance in the CNN responses and visualize them using principal component analysis. We show qualitative and quantitative results of our study on three trained CNNs: AlexNet [18], Places [43], and Oxford VGG [8]. We observe important differences across the different networks and CNN layers with respect to different scene factors and object categories. Finally, we demonstrate that our analysis based on computer-generated imagery translates to the network representation of natural images.

computer vision and pattern recognition | 2016

Marr Revisited: 2D-3D Alignment via Surface Normal Prediction

Aayush Bansal; Bryan C. Russell; Abhinav Gupta

We introduce an approach that leverages surface normal predictions, along with appearance cues, to retrieve 3D models for objects depicted in 2D still images from a large CAD object library. Critical to the success of our approach is the ability to recover accurate surface normals for objects in the depicted scene. We introduce a skip-network model built on the pre-trained Oxford VGG convolutional neural network (CNN) for surface normal prediction. Our model achieves state-of-the-art accuracy on the NYUv2 RGB-D dataset for surface normal prediction, and recovers fine object detail compared to previous methods. Furthermore, we develop a two-stream network over the input image and predicted surface normals that jointly learns pose and style for CAD model retrieval. When using the predicted surface normals, our two-stream network matches prior work using surface normals computed from RGB-D images on the task of pose prediction, and achieves state of the art when using RGB-D input. Finally, our two-stream network allows us to retrieve CAD models that better match the style and pose of a depicted object compared with baseline approaches.

Explore More