Vidit Jain | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Vidit Jain is active.

Explore More

Publication

Featured researches published by Vidit Jain.

international conference on computer vision | 2007

Unsupervised Joint Alignment of Complex Images

Gary B. Huang; Vidit Jain; Erik G. Learned-Miller

Many recognition algorithms depend on careful positioning of an object into a canonical pose, so the position of features relative to a fixed coordinate system can be examined. Currently, this positioning is done either manually or by training a class-specialized learning algorithm with samples of the class that have been hand-labeled with parts or poses. In this paper, we describe a novel method to achieve this positioning using poorly aligned examples of a class with no additional labeling. Given a set of unaligned examplars of a class, such as faces, we automatically build an alignment mechanism, without any additional labeling of parts or poses in the data set. Using this alignment mechanism, new members of the class, such as faces resulting from a face detector, can be precisely aligned for the recognition process. Our alignment method improves performance on a face recognition task, both over unaligned images and over images aligned with a face alignment algorithm specifically developed for and trained on hand-labeled face images. We also demonstrate its use on an entirely different class of objects (cars), again without providing any information about parts or pose to the learning algorithm.

international world wide web conferences | 2011

Learning to re-rank: query-dependent image re-ranking using click data

Vidit Jain; Manik Varma

Our objective is to improve the performance of keyword based image search engines by re-ranking their original results. To this end, we address three limitations of existing search engines in this paper. First, there is no straight-forward, fully automated way of going from textual queries to visual features. Image search engines therefore primarily rely on static and textual features for ranking. Visual features are mainly used for secondary tasks such as finding similar images. Second, image rankers are trained on query-image pairs labeled with relevance judgments determined by human experts. Such labels are well known to be noisy due to various factors including ambiguous queries, unknown user intent and subjectivity in human judgments. This leads to learning a sub-optimal ranker. Finally, a static ranker is typically built to handle disparate user queries. The ranker is therefore unable to adapt its parameters to suit the query at hand which again leads to sub-optimal results. We demonstrate that all of these problems can be mitigated by employing a re-ranking algorithm that leverages aggregate user click data. We hypothesize that images clicked in response to a query are mostly relevant to the query. We therefore re-rank the original search results so as to promote images that are likely to be clicked to the top of the ranked list. Our re-ranking algorithm employs Gaussian Process regression to predict the normalized click count for each image, and combines it with the original ranking score. Our approach is shown to significantly boost the performance of the Bing image search engine on a wide range of tail queries.

computer vision and pattern recognition | 2011

Online domain adaptation of a pre-trained cascade of classifiers

Vidit Jain; Erik G. Learned-Miller

Many classifiers are trained with massive training sets only to be applied at test time on data from a different distribution. How can we rapidly and simply adapt a classifier to a new test distribution, even when we do not have access to the original training data? We present an on-line approach for rapidly adapting a “black box” classifier to a new test data set without retraining the classifier or examining the original optimization criterion. Assuming the original classifier outputs a continuous number for which a threshold gives the class, we reclassify points near the original boundary using a Gaussian process regression scheme. We show how this general procedure can be used in the context of a classifier cascade, demonstrating performance that far exceeds state-of-the-art results in face detection on a standard data set. We also draw connections to work in semi-supervised learning, domain adaptation, and information regularization.

international conference on computer vision | 2007

People-LDA: Anchoring Topics to People using Face Recognition

Vidit Jain; Erik G. Learned-Miller; Andrew McCallum

Topic models have recently emerged as powerful tools for modeling topical trends in documents. Often the resulting topics are broad and generic, associating large groups of people and issues that are loosely related. In many cases, it may be desirable to influence the direction in which topic models develop. In this paper, we explore the idea of centering topics around people. In particular, given a large corpus of images featuring collections of people and associated captions, it seems natural to extract topics specifically focussed on each person. What words are most associated with George Bush? Which with Condoleezza Rice? Since people play such an important role in life, it is natural to anchor one topic to each person. In this paper, we present People-LDA, which uses the coherence efface images in news captions to guide the development of topics. In particular, we show how topics can be refined to be more closely related to a single person (like George Bush) rather than describing groups of people in a related area (like politics). To do this we introduce a new graphical model that tightly couples images and captions through a modern face recognizer. In addition to producing topics that are people specific (using images as a guiding force), the model also performs excellent soft clustering efface images, using the language model to boost performance. We present a variety of experiments comparing our method to recent developments in topic modeling and joint image-language modeling, showing that our model has lower perplexity for face identification than competing models and produces more refined topics.

british machine vision conference | 2006

Discriminative training of hyper-feature models for object identification

Vidit Jain; Andras Ferencz; Erik G. Learned-Miller

Object identification is the task of identifying specific obj ects belonging to the same class such as cars. We often need to recognize an object that we have only seen a few times. In fact, we often observe only one example of a particular object before we need to recognize it again. Thus we are interested in building a system which can learn to extract distinctive markers from a single example and which can then be used to identify the object in another image as “same” or “different”. Previous work by Ferencz et al. introduced the notion of hyper-features, which are properties of an image patch that can be used to estimate the utility of the patch in subsequent matching tasks. In this work, we show that hyper-feature based models can be more efficiently estimate d using discriminative training techniques. In particular, we describe a n ew hyper-feature model based upon logistic regression that shows improved performance over previously published techniques. Our approach significant ly outperforms Bayesian face recognition that is considered as a standard benchmark for face recognition.

computer vision and pattern recognition | 2008

Selective hidden random fields: Exploiting domain-specific saliency for event classification

Vidit Jain; Amit Singhal; Jiebo Luo

Classifying an event captured in an image is useful for understanding the contents of the image. The captured event provides context to refine models for the presence and appearance of various entities, such as people and objects, in the captured scene. Such contextual processing facilitates the generation of better abstractions and annotations for the image. Consider a typical set of consumer images with sports-related content. These images are taken mostly by amateur photographers, and often at a distance. In the absence of manual annotation or other sources of information such as time and location, typical recognition tasks are formidable on these images. Identifying the sporting event in these images provides a context for further recognition and annotation tasks. We propose to use the domain-specific saliency of the appearances of the playing surfaces, and ignore the noninformative parts of the image such as crowd regions, to discriminate among different sports. To this end, we present a variation of the hidden-state conditional random field that selects a subset of the observed features suitable for classification. The inferred hidden variables in this model represent a selection criteria desirable for the problem domain. For sports-related images, this selection criteria corresponds to the segmentation of the playing surface in the image. We demonstrate the utility of this model on consumer images collected from the Internet.

international conference on computer vision | 2013

Adapting Classification Cascades to New Domains

Vidit Jain; Sachin Sudhakar Farfade

Classification cascades have been very effective for object detection. Such a cascade fails to perform well in data domains with variations in appearances that may not be captured in the training examples. This limited generalization severely restricts the domains for which they can be used effectively. A common approach to address this limitation is to train a new cascade of classifiers from scratch for each of the new domains. Building separate detectors for each of the different domains requires huge annotation and computational effort, making it not scalable to a large number of data domains. Here we present an algorithm for quickly adapting a pre-trained cascade of classifiers - using a small number of labeled positive instances from a different yet similar data domain. In our experiments with images of human babies and human-like characters from movies, we demonstrate that the adapted cascade significantly outperforms both of the original cascade and the one trained from scratch using the given training examples.

international world wide web conferences | 2013

Topical organization of user comments and application to content recommendation

Vidit Jain; Esther Galbrun

On a news website, an article may receive thousands of comments from its readers on a variety of topics. The usual display of these comments in a ranked list, e.g. by popularity, does not allow the user to follow discussions on a particular topic. Organizing them by semantic topics enables the user not only to selectively browse comments on a topic, but also to discover other significant topics of discussion in comments. This topical organization further allows to explicitly capture the immediate interests of the user even when she is not logged in. Here we use this information to recommend content that is relevant in the context of the comments being read by the user. We present an algorithm for building such a topical organization in a practical setting and study different recommendation schemes. In a pilot study, we observe these comments-to-article recommendations to be preferred over the standard article-to-article recommendations.

very large data bases | 2015

Tracking the conductance of rapidly evolving topic-subgraphs

Sainyam Galhotra; Amitabha Bagchi; Srikanta J. Bedathur; Maya Ramanath; Vidit Jain

Monitoring the formation and evolution of communities in large online social networks such as Twitter is an important problem that has generated considerable interest in both industry and academia. Fundamentally, the problem can be cast as studying evolving sugraphs (each subgraph corresponding to a topical community) on an underlying social graph - with users as nodes and the connection between them as edges. A key metric of interest in this setting is tracking the changes to the conductance of subgraphs induced by edge activations. This metric quantifies how well or poorly connected a subgraph is to the rest of the graph relative to its internal connections. Conductance has been demonstrated to be of great use in many applications, such as identifying bursty topics, tracking the spread of rumors, and so on. However, tracking this simple metric presents a considerable scalability challenge - the underlying social network is large, the number of communities that are active at any moment is large, the rate at which these communities evolve is high, and moreover, we need to track conductance in real-time. We address these challenges in this paper. We propose an in-memory approximation called BloomGraphs to store and update these (possibly overlapping) evolving subgraphs. As the name suggests, we use Bloom filters to represent an approximation of the underlying graph. This representation is compact and computationally efficient to maintain in the presence of updates. This is especially important when we need to simultaneously maintain thousands of evolving subgraphs. BloomGraphs are used in computing and tracking conductance of these subgraphs as edge-activations arrive. BloomGraphs have several desirable properties in the context of this application, including a small memory footprint and efficient updateability. We also demonstrate mathematically that the error incurred in computing conductance is one-sided and that in the case of evolving subgraphs the change in approximate conductance has the same sign as the change in exact conductance in most cases. We validate the effectiveness of BloomGraphs through extensive experimentation on large Twitter graphs and other social networks.

international world wide web conferences | 2014

Short-text representation using diffusion wavelets

Vidit Jain; Jay Mahadeokar

Usual text document representations such as tf-idf do not work well in classification tasks for short-text documents and across diverse data domains. Optimizing different representations for different data domains is infeasible in a practical setting on the Internet. Mining such representations from the data in an unsupervised manner is desirable. In this paper, we study a representation based on the multi-scale harmonic analysis of term-term co-occurrence graph. This representation is not only sparse, but also leads to the discovery of semantically coherent topics in data. In our experiments on user-generated short documents e.g., newsgroup messages, user comments, and meta-data, we found this representation to outperform other representations across different choice of classifiers. Similar improvements were also observed for data sets in Chinese and Portuguese languages.

Explore More