Jia Deng | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jia Deng is active.

Explore More

Publication

Featured researches published by Jia Deng.

computer vision and pattern recognition | 2009

ImageNet: A large-scale hierarchical image database

Jia Deng; Wei Dong; Richard Socher; Li-Jia Li; Kai Li; Li Fei-Fei

The explosion of image data on the Internet has the potential to foster more sophisticated and robust models and algorithms to index, retrieve, organize and interact with images and multimedia data. But exactly how such data can be harnessed and organized remains a critical problem. We introduce here a new database called “ImageNet”, a large-scale ontology of images built upon the backbone of the WordNet structure. ImageNet aims to populate the majority of the 80,000 synsets of WordNet with an average of 500-1000 clean and full resolution images. This will result in tens of millions of annotated images organized by the semantic hierarchy of WordNet. This paper offers a detailed analysis of ImageNet in its current state: 12 subtrees with 5247 synsets and 3.2 million images in total. We show that ImageNet is much larger in scale and diversity and much more accurate than the current image datasets. Constructing such a large-scale database is a challenging task. We describe the data collection scheme with Amazon Mechanical Turk. Lastly, we illustrate the usefulness of ImageNet through three simple applications in object recognition, image classification and automatic object clustering. We hope that the scale, accuracy, diversity and hierarchical structure of ImageNet can offer unparalleled opportunities to researchers in the computer vision community and beyond.

european conference on computer vision | 2010

What does classifying more than 10,000 image categories tell us?

Jia Deng; Alexander C. Berg; Kai Li; Li Fei-Fei

Image classification is a critical task for both humans and computers. One of the challenges lies in the large scale of the semantic space. In particular, humans can recognize tens of thousands of object classes and scenes. No computer vision algorithm today has been tested at this scale. This paper presents a study of large scale categorization including a series of challenging experiments on classification with more than 10, 000 image classes. We find that a) computational issues become crucial in algorithm design; b) conventional wisdom from a couple of hundred image categories on relative performance of different classifiers does not necessarily hold when the number of categories increases; c) there is a surprisingly strong relationship between the structure of WordNet (developed for studying language) and the difficulty of visual categorization; d) classification can be improved by exploiting the semantic hierarchy. Toward the future goal of developing automatic vision algorithms to recognize tens of thousands or even millions of image categories, we make a series of observations and arguments about dataset scale, category density, and image hierarchy.

european conference on computer vision | 2016

Stacked Hourglass Networks for Human Pose Estimation

Alejandro Newell; Kaiyu Yang; Jia Deng

This work introduces a novel convolutional network architecture for the task of human pose estimation. Features are processed across all scales and consolidated to best capture the various spatial relationships associated with the body. We show how repeated bottom-up, top-down processing used in conjunction with intermediate supervision is critical to improving the performance of the network. We refer to the architecture as a “stacked hourglass” network based on the successive steps of pooling and upsampling that are done to produce a final set of predictions. State-of-the-art results are achieved on the FLIC and MPII benchmarks outcompeting all recent methods.

international conference on computer vision | 2013

3D Object Representations for Fine-Grained Categorization

Jonathan Krause; Michael Stark; Jia Deng; Li Fei-Fei

While 3D object representations are being revived in the context of multi-view object class detection and scene understanding, they have not yet attained wide-spread use in fine-grained categorization. State-of-the-art approaches achieve remarkable performance when training data is plentiful, but they are typically tied to flat, 2D representations that model objects as a collection of unconnected views, limiting their ability to generalize across viewpoints. In this paper, we therefore lift two state-of-the-art 2D object representations to 3D, on the level of both local feature appearance and location. In extensive experiments on existing and newly proposed datasets, we show our 3D object representations outperform their state-of-the-art 2D counterparts for fine-grained categorization and demonstrate their efficacy for estimating 3D geometry from images via ultra-wide baseline matching and 3D reconstruction.

computer vision and pattern recognition | 2011

Hierarchical semantic indexing for large scale image retrieval

Jia Deng; Alexander C. Berg; Li Fei-Fei

This paper addresses the problem of similar image retrieval, especially in the setting of large-scale datasets with millions to billions of images. The core novel contribution is an approach that can exploit prior knowledge of a semantic hierarchy. When semantic labels and a hierarchy relating them are available during training, significant improvements over the state of the art in similar image retrieval are attained. While some of this advantage comes from the ability to use additional information, experiments exploring a special case where no additional data is provided, show the new approach can still outperform OASIS [6], the current state of the art for similarity learning. Exploiting hierarchical relationships is most important for larger scale problems, where scalability becomes crucial. The proposed learning approach is fundamentally parallelizable and as a result scales more easily than previous work. An additional contribution is a novel hashing scheme (for bilinear similarity on vectors of probabilities, optionally taking into account hierarchy) that is able to reduce the computational cost of retrieval. Experiments are performed on Caltech256 and the larger ImageNet dataset.

european conference on computer vision | 2014

Large-Scale Object Classification Using Label Relation Graphs

Jia Deng; Nan Ding; Yangqing Jia; Andrea Frome; Kevin P. Murphy; Samy Bengio; Yuan Li; Hartmut Neven; Hartwig Adam

In this paper we study how to perform object classification in a principled way that exploits the rich structure of real world labels. We develop a new model that allows encoding of flexible relations between labels. We introduce Hierarchy and Exclusion (HEX) graphs, a new formalism that captures semantic relations between any two labels applied to the same object: mutual exclusion, overlap and subsumption. We then provide rigorous theoretical analysis that illustrates properties of HEX graphs such as consistency, equivalence, and computational implications of the graph structure. Next, we propose a probabilistic classification model based on HEX graphs and show that it enjoys a number of desirable properties. Finally, we evaluate our method using a large-scale benchmark. Empirical results demonstrate that our model can significantly improve object classification by exploiting the label relations.

international conference on computer graphics and interactive techniques | 2007

Digital bas-relief from 3D scenes

Tim Weyrich; Jia Deng; Connelly Barnes; Szymon Rusinkiewicz; Adam Finkelstein

We present a system for semi-automatic creation of bas-relief sculpture. As an artistic medium, relief spans the continuum between 2D drawing or painting and full 3D sculpture. Bas-relief (or low relief) presents the unique challenge of squeezing shapes into a nearly-flat surface while maintaining as much as possible the perception of the full 3D scene. Our solution to this problem adapts methods from the tone-mapping literature, which addresses the similar problem of squeezing a high dynamic range image into the (low) dynamic range available on typical display devices. However, the bas-relief medium imposes its own unique set of requirements, such as maintaining small, fixed-size depth discontinuities. Given a 3D model, camera, and a few parameters describing the relative attenuation of different frequencies in the shape, our system creates a relief that gives the illusion of the 3D shape from a given vantage point while conforming to a greatly compressed height.

european conference on computer vision | 2008

Towards Scalable Dataset Construction: An Active Learning Approach

Brendan M. Collins; Jia Deng; Kai Li; Li Fei-Fei

As computer vision research considers more object categories and greater variation within object categories, it is clear that larger and more exhaustive datasets are necessary. However, the process of collecting such datasets is laborious and monotonous. We consider the setting in which many images have been automatically collected for a visual category (typically by automatic internet search), and we must separate relevant images from noise. We present a discriminative learning process which employs active, online learning to quickly classify many images with minimal user input. The principle advantage of this work over previous endeavors is its scalability. We demonstrate precision which is often superior to the state-of-the-art, with scalability which exceeds previous work.

computer vision and pattern recognition | 2012

Hedging your bets: Optimizing accuracy-specificity trade-offs in large scale visual recognition

Jia Deng; Jonathan Krause; Alexander C. Berg; Li Fei-Fei

As visual recognition scales up to ever larger numbers of categories, maintaining high accuracy is increasingly difficult. In this work, we study the problem of optimizing accuracy-specificity trade-offs in large scale recognition, motivated by the observation that object categories form a semantic hierarchy consisting of many levels of abstraction. A classifier can select the appropriate level, trading off specificity for accuracy in case of uncertainty. By optimizing this trade-off, we obtain classifiers that try to be as specific as possible while guaranteeing an arbitrarily high accuracy. We formulate the problem as maximizing information gain while ensuring a fixed, arbitrarily small error rate with a semantic hierarchy. We propose the Dual Accuracy Reward Trade-off Search (DARTS) algorithm and prove that, under practical conditions, it converges to an optimal solution. Experiments demonstrate the effectiveness of our algorithm on datasets ranging from 65 to over 10,000 categories.

international conference on pattern recognition | 2014

Learning Features and Parts for Fine-Grained Recognition

Jonathan Krause; Timnit Gebru; Jia Deng; Li-Jia Li; Li Fei-Fei

This paper addresses the problem of fine-grained recognition: recognizing subordinate categories such as bird species, car models, or dog breeds. We focus on two major challenges: learning expressive appearance descriptors and localizing discriminative parts. To this end, we propose an object representation that detects important parts and describes fine grained appearances. The part detectors are learned in a fully unsupervised manner, based on the insight that images with similar poses can be automatically discovered for fine-grained classes in the same domain. The appearance descriptors are learned using a convolutional neural network. Our approach requires only image level class labels, without any use of part annotations or segmentation masks, which may be costly to obtain. We show experimentally that combining these two insights is an effective strategy for fine-grained recognition.

Explore More