Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Dhruv Batra is active.

Publication


Featured researches published by Dhruv Batra.


international conference on computer vision | 2015

VQA: Visual Question Answering

Stanislaw Antol; Aishwarya Agrawal; Jiasen Lu; Margaret Mitchell; Dhruv Batra; C. Lawrence Zitnick; Devi Parikh

We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural language answer. Mirroring real-world scenarios, such as helping the visually impaired, both the questions and answers are open-ended. Visual questions selectively target different areas of an image, including background details and underlying context. As a result, a system that succeeds at VQA typically needs a more detailed understanding of the image and complex reasoning than a system producing generic image captions. Moreover, VQA is amenable to automatic evaluation, since many open-ended answers contain only a few words or a closed set of answers that can be provided in a multiple-choice format. We provide a dataset containing ~0.25M images, ~0.76M questions, and ~10M answers (www.visualqa.org), and discuss the information it provides. Numerous baselines for VQA are provided and compared with human performance.


computer vision and pattern recognition | 2010

iCoseg: Interactive co-segmentation with intelligent scribble guidance

Dhruv Batra; Adarsh Kowdle; Devi Parikh; Jiebo Luo; Tsuhan Chen

This paper presents an algorithm for Interactive Co-segmentation of a foreground object from a group of related images. While previous approaches focus on unsupervised co-segmentation, we use successful ideas from the interactive object-cutout literature. We develop an algorithm that allows users to decide what foreground is, and then guide the output of the co-segmentation algorithm towards it via scribbles. Interestingly, keeping a user in the loop leads to simpler and highly parallelizable energy functions, allowing us to work with significantly more images per group. However, unlike the interactive single image counterpart, a user cannot be expected to exhaustively examine all cutouts (from tens of images) returned by the system to make corrections. Hence, we propose iCoseg, an automatic recommendation system that intelligently recommends where the user should scribble next. We introduce and make publicly available the largest co-segmentation datasetyet, the CMU-Cornell iCoseg Dataset, with 38 groups, 643 images, and pixelwise hand-annotated groundtruth. Through machine experiments and real user studies with our developed interface, we show that iCoseg can intelligently recommend regions to scribble on, and users following these recommendations can achieve good quality cutouts with significantly lower time and effort than exhaustively examining all cutouts.


european conference on computer vision | 2012

Diverse M-best solutions in markov random fields

Dhruv Batra; Payman Yadollahpour; Abner Guzman-Rivera; Gregory Shakhnarovich

Much effort has been directed at algorithms for obtaining the highest probability (MAP) configuration in probabilistic (random field) models. In many situations, one could benefit from additional high-probability solutions. Current methods for computing the M most probable configurations produce solutions that tend to be very similar to the MAP solution and each other. This is often an undesirable property. In this paper we propose an algorithm for the Diverse M-Best problem, which involves finding a diverse set of highly probable solutions under a discrete probabilistic model. Given a dissimilarity function measuring closeness of two solutions, our formulation involves maximizing a linear combination of the probability and dissimilarity to previous solutions. Our formulation generalizes the M-Best MAP problem and we show that for certain families of dissimilarity functions we can guarantee that these solutions can be found as easily as the MAP solution.


International Journal of Computer Vision | 2015

A Comparative Study of Modern Inference Techniques for Structured Discrete Energy Minimization Problems

Jörg Hendrik Kappes; Bjoern Andres; Fred A. Hamprecht; Christoph Schnörr; Sebastian Nowozin; Dhruv Batra; Sungwoong Kim; Bernhard X. Kausler; Thorben Kröger; Jan Lellmann; Nikos Komodakis; Bogdan Savchynskyy; Carsten Rother

Szeliski et al. published an influential study in 2006 on energy minimization methods for Markov random fields. This study provided valuable insights in choosing the best optimization technique for certain classes of problems. While these insights remain generally useful today, the phenomenal success of random field models means that the kinds of inference problems that have to be solved changed significantly. Specifically, the models today often include higher order interactions, flexible connectivity structures, large label-spaces of different cardinalities, or learned energy tables. To reflect these changes, we provide a modernized and enlarged study. We present an empirical comparison of more than 27 state-of-the-art optimization techniques on a corpus of 2453 energy minimization instances from diverse applications in computer vision. To ensure reproducibility, we evaluate all methods in the OpenGM 2 framework and report extensive results regarding runtime and solution quality. Key insights from our study agree with the results of Szeliski et al. for the types of models they studied. However, on new and challenging types of models our findings disagree and suggest that polyhedral methods and integer programming solvers are competitive in terms of runtime and solution quality over a large range of model types.


computer vision and pattern recognition | 2017

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering

Yash Goyal; Tejas Khot; Douglas Summers-Stay; Dhruv Batra; Devi Parikh

The problem of visual question answering (VQA) is of significant importance both as a challenging research question and for the rich set of applications it enables. In this context, however, inherent structure in our world and bias in our language tend to be a simpler signal for learning than visual modalities, resulting in VQA models that ignore visual information, leading to an inflated sense of their capability. We propose to counter these language priors for the task of VQA and make vision (the V in VQA) matter! Specifically, we balance the popular VQA dataset (Antol et al., in: ICCV, 2015) by collecting complementary images such that every question in our balanced dataset is associated with not just a single image, but rather a pair of similar images that result in two different answers to the question. Our dataset is by construction more balanced than the original VQA dataset and has approximately twice the number of image-question pairs. Our complete balanced dataset is publicly available at http://visualqa.org/ as part of the 2nd iteration of the VQA Dataset and Challenge (VQA v2.0). We further benchmark a number of state-of-art VQA models on our balanced dataset. All models perform significantly worse on our balanced dataset, suggesting that these models have indeed learned to exploit language priors. This finding provides the first concrete empirical evidence for what seems to be a qualitative sense among practitioners. We also present interesting insights from analysis of the participant entries in VQA Challenge 2017, organized by us on the proposed VQA v2.0 dataset. The results of the challenge were announced in the 2nd VQA Challenge Workshop at the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017. Finally, our data collection protocol for identifying complementary images enables us to develop a novel interpretable model, which in addition to providing an answer to the given (image, question) pair, also provides a counter-example based explanation. Specifically, it identifies an image that is similar to the original image, but it believes has a different answer to the same question. This can help in building trust for machines among their users.


International Journal of Computer Vision | 2011

Interactively Co-segmentating Topically Related Images with Intelligent Scribble Guidance

Dhruv Batra; Adarsh Kowdle; Devi Parikh; Jiebo Luo; Tsuhan Chen

We present an algorithm for Interactive Co-segmentation of a foreground object from a group of related images. While previous works in co-segmentation have focussed on unsupervised co-segmentation, we use successful ideas from the interactive object-cutout literature. We develop an algorithm that allows users to decide what foreground is, and then guide the output of the co-segmentation algorithm towards it via scribbles. Interestingly, keeping a user in the loop leads to simpler and highly parallelizable energy functions, allowing us to work with significantly more images per group. However, unlike the interactive single-image counterpart, a user cannot be expected to exhaustively examine all cutouts (from tens of images) returned by the system to make corrections. Hence, we propose iCoseg, an automatic recommendation system that intelligently recommends where the user should scribble next. We introduce and make publicly available the largest co-segmentation dataset yet, the CMU-Cornell iCoseg dataset, with 38 groups, 643 images, and pixelwise hand-annotated groundtruth. Through machine experiments and real user studies with our developed interface, we show that iCoseg can intelligently recommend regions to scribble on, and users following these recommendations can achieve good quality cutouts with significantly lower time and effort than exhaustively examining all cutouts.


computer vision and pattern recognition | 2016

Joint Unsupervised Learning of Deep Representations and Image Clusters

Jianwei Yang; Devi Parikh; Dhruv Batra

In this paper, we propose a recurrent framework for joint unsupervised learning of deep representations and image clusters. In our framework, successive operations in a clustering algorithm are expressed as steps in a recurrent process, stacked on top of representations output by a Convolutional Neural Network (CNN). During training, image clusters and representations are updated jointly: image clustering is conducted in the forward pass, while representation learning in the backward pass. Our key idea behind this framework is that good representations are beneficial to image clustering and clustering results provide supervisory signals to representation learning. By integrating two processes into a single model with a unified weighted triplet loss function and optimizing it end-to-end, we can obtain not only more powerful representations, but also more precise image clusters. Extensive experiments show that our method outperforms the state of-the-art on image clustering across a variety of image datasets. Moreover, the learned representations generalize well when transferred to other tasks. The source code can be downloaded from https://github.com/ jwyang/joint-unsupervised-learning.


computer vision and pattern recognition | 2013

Discriminative Re-ranking of Diverse Segmentations

Payman Yadollahpour; Dhruv Batra; Gregory Shakhnarovich

This paper introduces a hybrid, two-stage approach to semantic image segmentation. In the first stage a probabilistic model generates a set of diverse plausible segmentations. In the second stage, a discriminatively trained re-ranking model selects the best segmentation from this set. The re-ranking stage can use much more complex features than what could be tractably used in the probabilistic model, allowing a better exploration of the solution space than possible by simply producing the most probable solution from the probabilistic model. While our proposed approach already achieves state-of-the-art results (48%) on the challenging VOC 2012 dataset, our machine and human analyses suggest that even larger gains are possible with such an approach.


north american chapter of the association for computational linguistics | 2016

A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories.

Nasrin Mostafazadeh; Nathanael Chambers; Xiaodong He; Devi Parikh; Dhruv Batra; Lucy Vanderwende; Pushmeet Kohli; James F. Allen

Representation and learning of commonsense knowledge is one of the foundational problems in the quest to enable deep language understanding. This issue is particularly challenging for understanding casual and correlational relationships between events. While this topic has received a lot of interest in the NLP community, research has been hindered by the lack of a proper evaluation framework. This paper attempts to address this problem with a new framework for evaluating story understanding and script learning: the `Story Cloze Test’. This test requires a system to choose the correct ending to a four-sentence story. We created a new corpus of 50k five-sentence commonsense stories, ROCStories, to enable this evaluation. This corpus is unique in two ways: (1) it captures a rich set of causal and temporal commonsense relations between daily events, and (2) it is a high quality collection of everyday life stories that can also be used for story generation. Experimental evaluation shows that a host of baselines and state-of-the-art models based on shallow language understanding struggle to achieve a high score on the Story Cloze Test. We discuss these implications for script and story learning, and offer suggestions for deeper language understanding.


computer vision and pattern recognition | 2008

Learning class-specific affinities for image labelling

Dhruv Batra; Rahul Sukthankar; Tsuhan Chen

Spectral clustering and eigenvector-based methods have become increasingly popular in segmentation and recognition. Although the choice of the pairwise similarity metric (or affinities) greatly influences the quality of the results, this choice is typically specified outside the learning framework. In this paper, we present an algorithm to learn class-specific similarity functions. Mapping our problem in a conditional random fields (CRF) framework enables us to pose the task of learning affinities as parameter learning in undirected graphical models. There are two significant advances over previous work. First, we learn the affinity between a pair of data-points as a function of a pairwise feature and (in contrast with previous approaches) the classes to which these two data-points were mapped, allowing us to work with a richer class of affinities. Second, our formulation provides a principled probabilistic interpretation for learning all of the parameters that define these affinities. Using ground truth segmentations and labellings for training, we learn the parameters with the greatest discriminative power (in an MLE sense) on the training data. We demonstrate the power of this learning algorithm in the setting of joint segmentation and recognition of object classes. Specifically, even with very simple appearance features, the proposed method achieves state-of-the-art performance on standard datasets.

Collaboration


Dive into the Dhruv Batra's collaboration.

Top Co-Authors

Avatar

Devi Parikh

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Stefan Lee

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Abhishek Das

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jianwei Yang

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Aishwarya Agrawal

Georgia Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge