Vignesh Ramanathan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Vignesh Ramanathan is active.

Explore More

Publication

Featured researches published by Vignesh Ramanathan.

computer vision and pattern recognition | 2016

Social LSTM: Human Trajectory Prediction in Crowded Spaces

Alexandre Alahi; Kratarth Goel; Vignesh Ramanathan; Alexandre Robicquet; Li Fei-Fei; Silvio Savarese

Pedestrians follow different trajectories to avoid obstacles and accommodate fellow pedestrians. Any autonomous vehicle navigating such a scene should be able to foresee the future positions of pedestrians and accordingly adjust its path to avoid collisions. This problem of trajectory prediction can be viewed as a sequence generation task, where we are interested in predicting the future trajectory of people based on their past positions. Following the recent success of Recurrent Neural Network (RNN) models for sequence prediction tasks, we propose an LSTM model which can learn general human movement and predict their future trajectories. This is in contrast to traditional approaches which use hand-crafted functions such as Social forces. We demonstrate the performance of our method on several public datasets. Our model outperforms state-of-the-art methods on some of these datasets. We also analyze the trajectories predicted by our model to demonstrate the motion behaviour learned by our model.

computer vision and pattern recognition | 2013

Social Role Discovery in Human Events

Vignesh Ramanathan; Bangpeng Yao; Li Fei-Fei

We deal with the problem of recognizing social roles played by people in an event. Social roles are governed by human interactions, and form a fundamental component of human event description. We focus on a weakly supervised setting, where we are provided different videos belonging to an event class, without training role labels. Since social roles are described by the interaction between people in an event, we propose a Conditional Random Field to model the inter-role interactions, along with person specific social descriptors. We develop tractable variational inference to simultaneously infer model weights, as well as role assignment to all people in the videos. We also present a novel YouTube social roles dataset with ground truth role annotations, and introduce annotations on a subset of videos from the TRECVID-MED11 [1] event kits for evaluation purposes. The performance of the model is compared against different baseline methods on these datasets.

international conference on computer vision | 2013

Video Event Understanding Using Natural Language Descriptions

Vignesh Ramanathan; Percy Liang; Li Fei-Fei

Human action and role recognition play an important part in complex event understanding. State-of-the-art methods learn action and role models from detailed spatio temporal annotations, which requires extensive human effort. In this work, we propose a method to learn such models based on natural language descriptions of the training videos, which are easier to collect and scale with the number of actions and roles. There are two challenges with using this form of weak supervision: First, these descriptions only provide a high-level summary and often do not directly mention the actions and roles occurring in a video. Second, natural language descriptions do not provide spatio temporal annotations of actions and roles. To tackle these challenges, we introduce a topic-based semantic relatedness (SR) measure between a video description and an action and role label, and incorporate it into a posterior regularization objective. Our event recognition system based on these action and role models matches the state-of-the-art method on the TRECVID-MED11 event kit, despite weaker supervision.

european conference on computer vision | 2014

Linking People in Videos with "Their" Names Using Coreference Resolution

Vignesh Ramanathan; Armand Joulin; Percy Liang; Li Fei-Fei

Natural language descriptions of videos provide a potentially rich and vast source of supervision. However, the highly-varied nature of language presents a major barrier to its effective use. What is needed are models that can reason over uncertainty over both videos and text. In this paper, we tackle the core task of person naming: assigning names of people in the cast to human tracks in TV videos. Screenplay scripts accompanying the video provide some crude supervision about who’s in the video. However, even the basic problem of knowing who is mentioned in the script is often difficult, since language often refers to people using pronouns (e.g., “he”) and nominals (e.g., “man”) rather than actual names (e.g., “Susan”). Resolving the identity of these mentions is the task of coreference resolution, which is an active area of research in natural language processing. We develop a joint model for person naming and coreference resolution, and in the process, infer a latent alignment between tracks and mentions. We evaluate our model on both vision and NLP tasks on a new dataset of 19 TV episodes. On both tasks, we significantly outperform the independent baselines.

computer vision and pattern recognition | 2015

Learning semantic relationships for better action retrieval in images

Vignesh Ramanathan; Congcong Li; Jia Deng; Wei Han; Zhen Li; Kunlong Gu; Yang Song; Samy Bengio; Chuck Rossenberg; Li Fei-Fei

Human actions capture a wide variety of interactions between people and objects. As a result, the set of possible actions is extremely large and it is difficult to obtain sufficient training examples for all actions. However, we could compensate for this sparsity in supervision by leveraging the rich semantic relationship between different actions. A single action is often composed of other smaller actions and is exclusive of certain others. We need a method which can reason about such relationships and extrapolate unobserved actions from known actions. Hence, we propose a novel neural network framework which jointly extracts the relationship between actions and uses them for training better action retrieval models. Our model incorporates linguistic, visual and logical consistency based cues to effectively identify these relationships. We train and test our model on a largescale image dataset of human actions. We show a significant improvement in mean AP compared to different baseline methods including the HEX-graph approach from Deng et al. [8].

international conference on computer vision | 2015

Learning Temporal Embeddings for Complex Video Analysis

Vignesh Ramanathan; Kevin Tang; Greg Mori; Li Fei-Fei

In this paper, we propose to learn temporal embeddings of video frames for complex video analysis. Large quantities of unlabeled video data can be easily obtained from the Internet. These videos possess the implicit weak label that they are sequences of temporally and semantically coherent images. We leverage this information to learn temporal embeddings for video frames by associating frames with the temporal context that they appear in. To do this, we propose a scheme for incorporating temporal context based on past and future frames in videos, and compare this to other contextual representations. In addition, we show how data augmentation using multi-resolution samples and hard negatives helps to significantly improve the quality of the learned embeddings. We evaluate various design decisions for learning temporal embeddings, and show that our embeddings can improve performance for multiple video tasks such as retrieval, classification, and temporal order recovery in unconstrained Internet video.

workshop on applications of computer vision | 2011

Quadtree decomposition based extended vector space model for image retrieval

Vignesh Ramanathan; Shaunak Mishra; Pabitra Mitra

Bag of visual words approach for image retrieval does not exploit the spatial distribution of visual words in an image. Previous attempts to incorporate the spatial distribution include modification of visual vocabulary using visual phrases along with visual words and use of spatial pyramid matching (SPM) techniques for comparing two images. This paper proposes a novel extended vector space based image retrieval technique which takes into account the spatial occurrence (context) of a visual word in an image along with the co-occurrence of other visual words in a pre-defined region (block) of the image obtained by quadtree decomposition of the image up to a fixed level of resolution. Experiments show a 19.22% increase in Mean Average Precision (MAP) over the BoW approach for the Caltech 101 database.

computer vision and pattern recognition | 2017

Learning to Learn from Noisy Web Videos

Serena Yeung; Vignesh Ramanathan; Olga Russakovsky; Liyue Shen; Greg Mori; Li Fei-Fei

Understanding the simultaneously very diverse and intricately fine-grained set of possible human actions is a critical open problem in computer vision. Manually labeling training videos is feasible for some action classes but doesnt scale to the full long-tailed distribution of actions. A promising way to address this is to leverage noisy data from web queries to learn new actions, using semi-supervised or webly-supervised approaches. However, these methods typically do not learn domain-specific knowledge, or rely on iterative hand-tuned data labeling policies. In this work, we instead propose a reinforcement learning-based formulation for selecting the right examples for training a classifier from noisy web search results. Our method uses Q-learning to learn a data labeling policy on a small labeled training dataset, and then uses this to automatically label noisy web data for new visual concepts. Experiments on the challenging Sports-1M action recognition benchmark as well as on additional fine-grained and newly emerging action classes demonstrate that our method is able to learn good labeling policies for noisy data and use this to learn accurate visual concept classifiers.

Transactions on Data Hiding and Multimedia Security VII | 2012

Secure steganography using randomized cropping

Arijit Sur; Vignesh Ramanathan; Jayanta Mukherjee

In this paper, a novel steganographic scheme is proposed, where embedding is done adaptively in image regions with high level of high frequency component. The steganalytic detection performance becomes poorer as these high frequency components mask the steganographic embedding noise. The security of the proposed scheme is further increased by separating the embedding domain from the steganalytic domain. This separation is done by randomizing the embedding domain using a new concept called randomized cropping. State of the art spatial domain steganalyzers are considered to evaluate the security of the proposed scheme. The LSB matching algorithm is used for steganographic embedding. It is shown experimentally that the LSB matching algorithm wrapped with proposed scheme performs comparatively better than simple LSB matching scheme, against the steganalytic attacks under consideration.

Group and Crowd Behavior for Computer Vision | 2017

Learning to Predict Human Behavior in Crowded Scenes

Alexandre Alahi; Vignesh Ramanathan; Kratarth Goel; Alexandre Robicquet; AmirAbbas Sadeghian; Li Fei-Fei; Silvio Savarese

Pedestrians follow different trajectories to avoid obstacles and accommodate fellow pedestrians. Any autonomous vehicle navigating such a scene should be able to foresee the future positions of pedestrians and accordingly adjust its path to avoid collisions. This problem of trajectory prediction can be viewed as a sequence generation task, where we are interested in predicting the future trajectory of people based on their past positions. Following the recent success of Recurrent Neural Network (RNN) models for sequence prediction tasks, we propose an LSTM model which can learn general human movement and predict their future trajectories. This is in contrast to traditional approaches which use hand-crafted functions such as Social Forces. We demonstrate the performance of our method on several public datasets. Our model outperforms state-of-the-art methods on some of these datasets. We also analyze the trajectories predicted by our model to demonstrate the motion behavior learned by our model. Moreover, we introduce a new characterization that describes the “social sensitivity” at which two targets interact. We use this characterization to define “navigation styles” and improve both forecasting models and state-of-the-art multi-target tracking – whereby the learned forecasting models help the data association step.

Explore More