Is this you? Create Your Porfile

Shubham Tulsiani

University of California, Berkeley

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shubham Tulsiani is active.

Explore More

Publication

Featured researches published by Shubham Tulsiani.

computer vision and pattern recognition | 2015

Viewpoints and keypoints

Shubham Tulsiani; Jitendra Malik

We characterize the problem of pose estimation for rigid objects in terms of determining viewpoint to explain coarse pose and keypoint prediction to capture the finer details. We address both these tasks in two different settings - the constrained setting with known bounding boxes and the more challenging detection setting where the aim is to simultaneously detect and correctly estimate pose of objects. We present Convolutional Neural Network based architectures for these and demonstrate that leveraging viewpoint estimates can substantially improve local appearance based keypoint predictions. In addition to achieving significant improvements over state-of-the-art in the above tasks, we analyze the error modes and effect of object characteristics on performance to guide future efforts towards this goal.

computer vision and pattern recognition | 2015

Category-specific object reconstruction from a single image

Abhishek Kar; Shubham Tulsiani; Joao Carreira; Jitendra Malik

Object reconstruction from a single image - in the wild - is a problem where we can make progress and get meaningful results today. This is the main message of this paper, which introduces an automated pipeline with pixels as inputs and 3D surfaces of various rigid categories as outputs in images of realistic scenes. At the core of our approach are deformable 3D models that can be learned from 2D annotations available in existing object detection datasets, that can be driven by noisy automatic object segmentations and which we complement with a bottom-up module for recovering high-frequency shape details. We perform a comprehensive quantitative analysis and ablation study of our approach using the recently introduced PASCAL 3D+ dataset and show very encouraging automatic reconstructions on PASCAL VOC.

computer vision and pattern recognition | 2017

Multi-view Supervision for Single-View Reconstruction via Differentiable Ray Consistency

Shubham Tulsiani; Tinghui Zhou; Alexei A. Efros; Jitendra Malik

We study the notion of consistency between a 3D shape and a 2D observation and propose a differentiable formulation which allows computing gradients of the 3D shape given an observation from an arbitrary view. We do so by reformulating view consistency using a differentiable ray consistency (DRC) term. We show that this formulation can be incorporated in a learning framework to leverage different types of multi-view observations e.g. foreground masks, depth, color images, semantics etc. as supervision for learning single-view 3D prediction. We present empirical analysis of our technique in a controlled setting. We also show that this approach allows us to improve over existing techniques for single-view reconstruction of objects from the PASCAL VOC dataset.

european conference on computer vision | 2016

View Synthesis by Appearance Flow

Tinghui Zhou; Shubham Tulsiani; Weilun Sun; Jitendra Malik; Alexei A. Efros

We address the problem of novel view synthesis: given an input image, synthesizing new images of the same object or scene observed from arbitrary viewpoints. We approach this as a learning task but, critically, instead of learning to synthesize pixels from scratch, we learn to copy them from the input image. Our approach exploits the observation that the visual appearance of different views of the same instance is highly correlated, and such correlation could be explicitly learned by training a convolutional neural network (CNN) to predict appearance flows – 2-D coordinate vectors specifying which pixels in the input view could be used to reconstruct the target view. Furthermore, the proposed framework easily generalizes to multiple input views by learning how to optimally combine single-view predictions. We show that for both objects and scenes, our approach is able to synthesize novel views of higher perceptual quality than previous CNN-based techniques.

user interface software and technology | 2013

A colorful approach to text processing by example

Kuat Yessenov; Shubham Tulsiani; Aditya Krishna Menon; Robert C. Miller; Sumit Gulwani; Butler W. Lampson; Adam Tauman Kalai

Text processing, tedious and error-prone even for programmers, remains one of the most alluring targets of Programming by Example. An examination of real-world text processing tasks found on help forums reveals that many such tasks, beyond simple string manipulation, involve latent hierarchical structures. We present STEPS, a programming system for processing structured and semi-structured text by example. STEPS users create and manipulate hierarchical structure by example. In a between-subject user study on fourteen computer scientists, STEPS compares favorably to traditional programming.

computer vision and pattern recognition | 2015

Virtual view networks for object reconstruction

Joao Carreira; Abhishek Kar; Shubham Tulsiani; Jitendra Malik

All that structure from motion algorithms “see” are sets of 2D points. We show that these impoverished views of the world can be faked for the purpose of reconstructing objects in challenging settings, such as from a single image, or from a few ones far apart, by recognizing the object and getting help from a collection of images of other objects from the same class. We synthesize virtual views by computing geodesics on networks connecting objects with similar viewpoints, and introduce techniques to increase the specificity and robustness of factorization-based object reconstruction in this setting. We report accurate object shape reconstruction from a single image on challenging PASCAL VOC data, which suggests that the current domain of applications of rigid structure-from-motion techniques may be significantly extended.

computer vision and pattern recognition | 2017

Learning Shape Abstractions by Assembling Volumetric Primitives

Shubham Tulsiani; Hao Su; Leonidas J. Guibas; Alexei A. Efros; Jitendra Malik

We present a learning framework for abstracting complex shapes by learning to assemble objects using 3D volumetric primitives. In addition to generating simple and geometrically interpretable explanations of 3D objects, our framework also allows us to automatically discover and exploit consistent structure in the data. We demonstrate that using our method allows predicting shape representations which can be leveraged for obtaining a consistent parsing across the instances of a shape collection and constructing an interpretable shape similarity measure. We also examine applications for image-based prediction as well as shape manipulation.

international conference on computer vision | 2015

Amodal Completion and Size Constancy in Natural Scenes

Abhishek Kar; Shubham Tulsiani; Joao Carreira; Jitendra Malik

We consider the problem of enriching current object detection systems with veridical object sizes and relative depth estimates from a single image. There are several technical challenges to this, such as occlusions, lack of calibration data and the scale ambiguity between object size and distance. These have not been addressed in full generality in previous work. Here we propose to tackle these issues by building upon advances in object recognition and using recently created large-scale datasets. We first introduce the task of amodal bounding box completion, which aims to infer the the full extent of the object instances in the image. We then propose a probabilistic framework for learning category-specific object size distributions from available annotations and leverage these in conjunction with amodal completions to infer veridical sizes of objects in novel images. Finally, we introduce a focal length prediction approach that exploits scene recognition to overcome inherent scale ambiguities and demonstrate qualitative results on challenging real-world scenes.

international conference on computer vision | 2015

Pose Induction for Novel Object Categories

Shubham Tulsiani; Joao Carreira; Jitendra Malik

We address the task of predicting pose for objects of unannotated object categories from a small seed set of annotated object classes. We present a generalized classifier that can reliably induce pose given a single instance of a novel category. In case of availability of a large collection of novel instances, our approach then jointly reasons over all instances to improve the initial estimates. We empirically validate the various components of our algorithm and quantitatively show that our method produces reliable pose estimates. We also show qualitative results on a diverse set of classes and further demonstrate the applicability of our system for learning shape models of novel object classes.

Pattern Recognition Letters | 2016

The three R's of computer vision

Jitendra Malik; Pablo Andrés Arbeláez; Joao Carreira; Katerina Fragkiadaki; Ross B. Girshick; Georgia Gkioxari; Saurabh Gupta; Bharath Hariharan; Abhishek Kar; Shubham Tulsiani

Bidirectional interactions between recognition, reconstruction and re-organization are very important.Bottom-up grouping generates object candidates, which can be classified top down, following which the segmentations are refined again.Recognition of 3D objects benefits from a reconstruction of 3D structure.3D reconstruction benefits from object category-specific priors.Reconstruction of 3D structure from video data goes hand in hand with the reorganization of the scene. We argue for the importance of the interaction between recognition, reconstruction and re-organization, and propose that as a unifying framework for computer vision. In this view, recognition of objects is reciprocally linked to re-organization, with bottom-up grouping processes generating candidates, which can be classified using top down knowledge, following which the segmentations can be refined again. Recognition of 3D objects could benefit from a reconstruction of 3D structure, and 3D reconstruction can benefit from object category-specific priors. We also show that reconstruction of 3D structure from video data goes hand in hand with the reorganization of the scene. We demonstrate pipelined versions of two systems, one for RGB-D images, and another for RGB images, which produce rich 3D scene interpretations in this framework.

Explore More