Pradeep Natarajan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Pradeep Natarajan is active.

Explore More

Publication

Featured researches published by Pradeep Natarajan.

computer vision and pattern recognition | 2012

Multimodal feature fusion for robust event detection in web videos

Pradeep Natarajan; Shuang Wu; Shiv Naga Prasad Vitaladevuni; Xiaodan Zhuang; Stavros Tsakalidis; Unsang Park; Rohit Prasad; Premkumar Natarajan

Combining multiple low-level visual features is a proven and effective strategy for a range of computer vision tasks. However, limited attention has been paid to combining such features with information from other modalities, such as audio and videotext, for large scale analysis of web videos. In our work, we rigorously analyze and combine a large set of low-level features that capture appearance, color, motion, audio and audio-visual co-occurrence patterns in videos. We also evaluate the utility of high-level (i.e., semantic) visual information obtained from detecting scene, object, and action concepts. Further, we exploit multimodal information by analyzing available spoken and videotext content using state-of-the-art automatic speech recognition (ASR) and videotext recognition systems. We combine these diverse features using a two-step strategy employing multiple kernel learning (MKL) and late score level fusion methods. Based on the TRECVID MED 2011 evaluations for detecting 10 events in a large benchmark set of ~45000 videos, our system showed the best performance among the 19 international teams.

ieee workshop on motion and video computing | 2007

Coupled Hidden Semi Markov Models for Activity Recognition

Pradeep Natarajan; Ramakant Nevatia

Recognizing human activity from a stream of sensory observations is important for a number of applications such as surveillance and human-computer interaction. Hidden Markov Models (HMMs) have been proposed as suitable tools for modeling the variations in the observations for the same action and for discriminating among different actions. HMMs have come in wide use for this task but the standard form suffers from several limitations. These include unrealistic models for the duration of a sub-event and not encoding interactions among multiple agents directly. Semi- Markov models and coupled HMMs have been proposed in previous work to handle these issues. We combine these two concepts into a coupled Hidden semi-Markov Model (CHSMM). CHSMMs pose huge computational complexity challenges. We present efficient algorithms for learning and decoding in such structures and demonstrate their utility by experiments with synthetic and real data.

computer vision and pattern recognition | 2008

View and scale invariant action recognition using multiview shape-flow models

Pradeep Natarajan; Ramakant Nevatia

Actions in real world applications typically take place in cluttered environments with large variations in the orientation and scale of the actor. We present an approach to simultaneously track and recognize known actions that is robust to such variations, starting from a person detection in the standing pose. In our approach we first render synthetic poses from multiple viewpoints using Mocap data for known actions and represent them in a conditional random field (CRF) whose observation potentials are computed using shape similarity and the transition potentials are computed using optical flow. We enhance these basic potentials with terms to represent spatial and temporal constraints and call our enhanced model the shape, flow, duration-conditional random field (SFD-CRF). We find the best sequence of actions using Viterbi search in the SFD-CRF. We demonstrate our approach on videos from multiple viewpoints and in the presence of background clutter.

computer vision and pattern recognition | 2010

Learning 3D action models from a few 2D videos for view invariant action recognition

Pradeep Natarajan; Vivek Kumar Singh; Ram Nevatia

Most existing approaches for learning action models work by extracting suitable low-level features and then training appropriate classifiers. Such approaches require large amounts of training data and do not generalize well to variations in viewpoint, scale and across datasets. Some work has been done recently to learn multi-view action models from Mocap data, but obtaining such data is time consuming and requires costly infrastructure. We present a method that addresses both these issues by learning action models from just a few video training samples. We model each action as a sequence of primitive actions, represented as functions which transform the actors state. We formulate model learning as a curve-fitting problem, and present a novel algorithm for learning human actions by lifting 2D annotations of a few keyposes to 3D and interpolating between them. Actions are inferred by sampling the models and accumulating the feature weights learned discriminatively using a latent state Perceptron algorithm. We show results comparable to state-of-art on the standard Weizmann dataset, with a much smaller train:test ratio, and also in datasets for visual gesture recognition and cluttered grocery store environments.

international conference on computer vision | 2005

EDF: A framework for Semantic Annotation of Video

Pradeep Natarajan; Ramakant Nevatia

Semantic annotation of multimedia data is needed for various tasks like content based indexing of databases and also for making inferences about the activities taking place in the environment. In this paper, we present a top level ontology which provides a framework for describing the semantic features in video. We do this in three steps - First, we identify the key components of semantic descriptions like objects and events and how domain specific ontologies can be developed from them. Second, we present a set of predicates for composing events and for describing various spatio-temporal relationships between events/entities. Third, we develop a scheme for reasoning with the developed ontologies to infer complex events from simple events using relational algebra. Finally, we have demonstrated the utility of our framework by developing an ontology for a specific domain. We conclude by analyzing the performance of the reasoning mechanism with simulated events in this domain.

Computer Vision and Image Understanding | 2013

Hierarchical multi-channel hidden semi Markov graphical models for activity recognition

Pradeep Natarajan; Ramakant Nevatia

Recognizing human actions from a stream of unsegmented sensory observations is important for a number of applications such as surveillance and human-computer interaction. A wide range of graphical models have been proposed for these tasks, and are typically extensions of the generative hidden Markov models (HMMs) or their discriminative counterpart, conditional random fields (CRFs). These extensions typically address one of three key limitations in the basic HMM/CRF formalism - unrealistic models for the duration of a sub-event, not encoding interactions among multiple agents directly and not modeling the inherent hierarchical organization of activities. In our work, we present a family of graphical models that generalize such extensions and simultaneously model event duration, multi agent interactions and hierarchical structure. We also present general algorithms for efficient learning and inference in such models based on local variational approximations. We demonstrate the effectiveness of our framework by developing graphical models for applications in automatic sign language (ASL) recognition, and for gesture and action recognition in videos. Our methods show results comparable to state-of-the-art in the datasets we consider, while requiring far fewer training examples compared to low-level feature based methods.

european conference on computer vision | 2012

Multi-channel shape-flow kernel descriptors for robust video event detection and retrieval

Pradeep Natarajan; Shuang Wu; Shiv Naga Prasad Vitaladevuni; Xiaodan Zhuang; Unsang Park; Rohit Prasad; Premkumar Natarajan

Despite the success of spatio-temporal visual features, they are hand-designed and aggregate image or flow gradients using a pre-specified, uniform set of orientation bins. Kernel descriptors [1] generalize such orientation histograms by defining match kernels over image patches, and have shown superior performance for visual object and scene recognition. In our work, we make two contributions: first, we extend kernel descriptors to the spatio-temporal domain to model salient flow, gradient and texture patterns in video. Further, we apply our kernel descriptors to extract features from different color channels. Second, we present a fast algorithm for kernel descriptor computation of O(1) complexity for each pixel in each video patch, producing two orders of magnitude speedup over conventional kernel descriptors and other popular motion features. Our evaluation results on TRECVID MED 2011 dataset indicate that the proposed multi-channel shape-flow kernel descriptors outperform several other features including SIFT, SURF, STIP and Color SIFT.

acm multimedia | 2011

Audio-visual fusion using bayesian model combination for web video retrieval

Vasant Manohar; Stavros Tsakalidis; Pradeep Natarajan; Rohit Prasad; Prem Natarajan

Combining features from multiple, heterogeneous, audio visual sources can significantly improve retrieval performance in consumer domain videos. However, such videos often contain unrelated overlaid audio content, or have significant camera motion to reliably extract visual features. We present an approach, which overcomes errors in individual feature streams by combining classifiers trained on multiple, heterogeneous feature streams using Bayesian model combination (BAYCOM). We demonstrate our method, by combining low-level audio and visual features, for classification of a large 200 hour web video corpus. The combined models outperform any of the individual features by 10%. Further, BAYCOM consistently outperforms traditional early and late fusion methods.

international conference on multimedia and expo | 2011

Large-scale, real-time logo recognition in broadcast videos

Pradeep Natarajan; Yue Wu; Shirin Saleem; Ehry MacRostie; Frederick Bernardin; Rohit Prasad; Prem Natarajan

Robust, real-time, multi-class logo detection in high resolution broadcast videos presents several difficult challenges. For most logos we only have a few training samples, which makes training robust classifiers hard. Also, logos could potentially occur anywhere in the image, and traditional sliding window approaches for logo/object detection are computationally intensive. We present a system that addresses these issues by first identifying a small set of possible logo locations in a frame, based on temporal continuity and multi-resolution search, and then successively pruning these locations for each logo template, using a cascade of color and edge based features. We present experimental results that demonstrate our system for detecting a total of 270 different logo classes in broadcast video from 5 different languages (English, Indonesian, Malay, Simplified and Traditional Chinese).

acm multimedia | 2013

Compact bag-of-words visual representation for effective linear classification

Xiaodan Zhuang; Shuang Wu; Pradeep Natarajan

Bag-of-words approaches have been shown to achieve state-of-the-art performance in large-scale multimedia event detection. However, the commonly used histogram representation of bag-of-words requires large codebook sizes and expensive nonlinear kernel based classifiers for optimal performance. To address these two issues, we present a two-part generative model for compact visual representation, based on the i-vector approach recently proposed for speech and audio modeling. First, we use a Gaussian mixture model (GMM) to model the joint distribution of local descriptors. Second, we use a low-dimensional factor representation that constrains the GMM parameters to a subspace that preserves most of the information. We further extend this method to incorporate overlapping spatial regions, forming a highly compact visual representation that achieves superior performance with fast linear classifiers. We evaluate the method on a large video dataset used in the TRECVID 2011 MED evaluation. With linear classifiers, the proposed representation, with one-tenth of the storage footprint, outperforms soft quantization histograms used in the top performing TRECVID 2011 MED systems.

Explore More