Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yashaswi Verma is active.

Publication


Featured researches published by Yashaswi Verma.


british machine vision conference | 2013

Exploring SVM for Image Annotation in Presence of Confusing Labels.

Yashaswi Verma; C. V. Jawahar

MBRM[1] 0.24/0.25/0.245/122 0.18/0.19/0.185/209 0.24/0.23/0.235/233 JEC[3] 0.27/0.32/0.293/139 0.22/0.25/0.234/224 0.28/0.29/0.285/250 TagProp-ML[2] 0.31/0.37/0.337/146 0.49/0.20/0.284/213 0.48/0.25/0.329/227 TagProp-s ML[2] 0.33/0.42/0.370/160 0.39/0.27/0.319/239 0.46/0.35/0.398/266 KSVM 0.29/0.43/0.346/174 0.30/0.28/0.290/256 0.43/0.27/0.332/266 KSVM-VT (Ours) 0.32/0.42/0.363/179 0.33/0.32/0.325/259 0.47/0.29/0.359/268 Table 1: Performance comparison among different methods. The prefix ‘K’ corresponds to kernelization using chi-squared kernel.


british machine vision conference | 2014

Im2Text and Text2Im: Associating Images and Texts for Cross-Modal Retrieval.

Yashaswi Verma; C. V. Jawahar

Building bilateral semantic associations between images and texts is among the fundamental problems in computer vision. In this paper, we study two complementary cross-modal prediction tasks: (i) predicting text(s) given an image (“Im2Text”), and (ii) predicting image(s) given a piece of text (“Text2Im”). We make no assumption on the specific form of text; i.e., it could be either a set of labels, phrases, or even captions. We pose both these tasks in a retrieval framework. For Im2Text, given a query image, our goal is to retrieve a ranked list of semantically relevant texts from an independent textcorpus (i.e., texts with no corresponding images). Similarly, for Text2Im, given a query text, we aim to retrieve a ranked list of semantically relevant images from a collection of unannotated images (i.e., images without any associated textual meta-data). We propose a novel Structural SVM based unified formulation for these two tasks. For both visual and textual data, two types of representations are investigated. These are based on: (1) unimodal probability distributions over topics learned using latent Dirichlet allocation, and (2) explicitly learned multi-modal correlations using canonical correlation analysis. Extensive experiments on three popular datasets (two medium and one web-scale) demonstrate that our framework gives promising results compared to existing models under various settings, thus confirming its efficacy for both the tasks.


International Journal of Computer Vision | 2017

Image Annotation by Propagating Labels from Semantic Neighbourhoods

Yashaswi Verma; C. V. Jawahar

Automatic image annotation aims at predicting a set of semantic labels for an image. Because of large annotation vocabulary, there exist large variations in the number of images corresponding to different labels (“class-imbalance”). Additionally, due to the limitations of human annotation, several images are not annotated with all the relevant labels (“incomplete-labelling”). These two issues affect the performance of most of the existing image annotation models. In this work, we propose 2-pass k-nearest neighbour (2PKNN) algorithm. It is a two-step variant of the classical k-nearest neighbour algorithm, that tries to address these issues in the image annotation task. The first step of 2PKNN uses “image-to-label” similarities, while the second step uses “image-to-image” similarities, thus combining the benefits of both. We also propose a metric learning framework over 2PKNN. This is done in a large margin set-up by generalizing a well-known (single-label) classification metric learning algorithm for multi-label data. In addition to the features provided by Guillaumin et al. (2009) that are used by almost all the recent image annotation methods, we benchmark using new features that include features extracted from a generic convolutional neural network model and those computed using modern encoding techniques. We also learn linear and kernelized cross-modal embeddings over different feature combinations to reduce semantic gap between visual features and textual labels. Extensive evaluations on four image annotation datasets (Corel-5K, ESP-Game, IAPR-TC12 and MIRFlickr-25K) demonstrate that our method achieves promising results, and establishes a new state-of-the-art on the prevailing image annotation datasets.


british machine vision conference | 2015

Exploring Locally Rigid Discriminative Patches for Learning Relative Attributes

Yashaswi Verma; C. V. Jawahar

Relative attributes help in comparing two images based on their visual properties. These are of great interest as they have been shown to be useful in several vision related problems such as recognition, retrieval, and understanding image collections in general. In the recent past, quite a few techniques have been proposed for the relative attribute learning task that give reasonable performance. However, these have focused either on the algorithmic aspect or the representational aspect. In this work, we revisit these approaches and integrate their broader ideas to develop simple baselines. These not only take care of the algorithmic aspects, but also take a step towards analyzing a simple yet domain independent patch-based representation for this task. This representation can capture local shape in an image, as well as spatially rigid correspondences across regions in an image pair. The baselines are extensively evaluated on three challenging relative attribute datasets (OSR, LFW-10 and UT-Zap50K). Experiments demonstrate that they achieve promising results on the OSR and LFW-10 datasets, and perform better than the current state-of-the-art on the UT-Zap50K dataset. Moreover, they also provide some interesting insights about the problem, that could be helpful in developing the future techniques in this domain.


acm multimedia | 2016

A Robust Distance with Correlated Metric Learning for Multi-Instance Multi-Label Data

Yashaswi Verma; C. V. Jawahar

In multi-instance data, every object is a bag that contains multiple elements or instances. Each bag may be assigned to one or more classes, such that it has at least one instance corresponding to every assigned class. However, since the annotations are at bag-level, there is no direct association between the instances within a bag and the assigned class labels, hence making the problem significantly challenging. While existing methods have mostly focused on Bag-to-Bag or Class-to-Bag distances, in this paper, we address the multiple instance learning problem using a novel Bag-to-Class distance measure. This is based on two observations: (a) existence of outliers is natural in multi-instance data, and (b) there may exist multiple instances within a bag that belong to a particular class. In order to address these, in the proposed distance measure (a) we employ L1-distance that brings robustness against outliers, and (b) rather than considering only the most similar instance-pair during distance computation as done by existing methods, we consider a subset of instances within a bag while determining its relevance to a given class. We parameterize the proposed distance measure using class-specific distance metrics, and propose a novel metric learning framework that explicitly captures inter-class correlations within the learned metrics. Experiments on two popular datasets demonstrate the effectiveness of the proposed distance measure and metric learning.


indian conference on computer vision, graphics and image processing | 2012

Neti Neti: in search of deity

Yashaswi Verma; C. V. Jawahar

A wide category of objects and scenes can be effectively searched and classified using the modern descriptors and classifiers. With the performance on many popular categories becoming satisfactory, we explore into the issues associated with much harder recognition problems. We address the problem of searching specific images in Indian stone-carvings and sculptures in an unsupervised setup. For this, we introduce a new dataset of 524 images containing sculptures and carvings of eight different Indian deities and three other subjects popular in the Indian scenario. We perform a thorough analysis to investigate various challenges associated with this task. A new image-representation is proposed using a sequence of discriminative patches mined in an unsupervised manner. For each image, these patches are identified based on their ability to distinguish the given image from the image most dissimilar to it. Then a rejection-based re-ranking scheme is formulated based on both similarity as well as dissimilarity between two images. This new scheme is experimentally compared with two baselines using state-of-the-art descriptors on the proposed dataset. Empirical evaluations demonstrate that our proposed method of image-representation and rejection cascade improves the retrieval performance on this hard problem as compared to the baseline descriptors.


european conference on computer vision | 2012

Image annotation using metric learning in semantic neighbourhoods

Yashaswi Verma; C. V. Jawahar


national conference on artificial intelligence | 2012

Choosing linguistics over vision to describe images

Ankush Gupta; Yashaswi Verma; C. V. Jawahar


computer vision and pattern recognition | 2014

Relative Parts: Distinctive Parts for Learning Relative Attributes

Ramachandruni N. Sandeep; Yashaswi Verma; C. V. Jawahar


computer vision and pattern recognition | 2013

Generating Image Descriptions Using Semantic Similarities in the Output Space

Yashaswi Verma; Ankush Gupta; Prashanth Mannem; C. V. Jawahar

Collaboration


Dive into the Yashaswi Verma's collaboration.

Top Co-Authors

Avatar

C. V. Jawahar

International Institute of Information Technology

View shared research outputs
Top Co-Authors

Avatar

Ankush Gupta

International Institute of Information Technology

View shared research outputs
Top Co-Authors

Avatar

Prashanth Mannem

International Institute of Information Technology

View shared research outputs
Top Co-Authors

Avatar

Ramachandruni N. Sandeep

International Institute of Information Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge