Pavan K. Turaga | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Pavan K. Turaga is active.

Explore More

Publication

Featured researches published by Pavan K. Turaga.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2011

Statistical Computations on Grassmann and Stiefel Manifolds for Image and Video-Based Recognition

Pavan K. Turaga; Ashok Veeraraghavan; Anuj Srivastava; Rama Chellappa

In this paper, we examine image and video-based recognition applications where the underlying models have a special structure-the linear subspace structure. We discuss how commonly used parametric models for videos and image sets can be described using the unified framework of Grassmann and Stiefel manifolds. We first show that the parameters of linear dynamic models are finite-dimensional linear subspaces of appropriate dimensions. Unordered image sets as samples from a finite-dimensional linear subspace naturally fall under this framework. We show that an inference over subspaces can be naturally cast as an inference problem on the Grassmann manifold. To perform recognition using subspace-based models, we need tools from the Riemannian geometry of the Grassmann manifold. This involves a study of the geometric properties of the space, appropriate definitions of Riemannian metrics, and definition of geodesics. Further, we derive statistical modeling of inter and intraclass variations that respect the geometry of the space. We apply techniques such as intrinsic and extrinsic statistics to enable maximum-likelihood classification. We also provide algorithms for unsupervised clustering derived from the geometry of the manifold. Finally, we demonstrate the improved performance of these methods in a wide variety of vision applications such as activity recognition, video-based face recognition, object recognition from image sets, and activity-based video clustering.

computer vision and pattern recognition | 2008

Statistical analysis on Stiefel and Grassmann manifolds with applications in computer vision

Pavan K. Turaga; Ashok Veeraraghavan; Rama Chellappa

Many applications in computer vision and pattern recognition involve drawing inferences on certain manifold-valued parameters. In order to develop accurate inference algorithms on these manifolds we need to a) understand the geometric structure of these manifolds b) derive appropriate distance measures and c) develop probability distribution functions (pdf) and estimation techniques that are consistent with the geometric structure of these manifolds. In this paper, we consider two related manifolds - the Stiefel manifold and the Grassmann manifold, which arise naturally in several vision applications such as spatio-temporal modeling, affine invariant shape analysis, image matching and learning theory. We show how accurate statistical characterization that reflects the geometry of these manifolds allows us to design efficient algorithms that compare favorably to the state of the art in these very different applications. In particular, we describe appropriate distance measures and parametric and non-parametric density estimators on these manifolds. These methods are then used to learn class conditional densities for applications such as activity recognition, video based face recognition and shape classification.

european conference on computer vision | 2010

Compressive acquisition of dynamic scenes

Aswin C. Sankaranarayanan; Pavan K. Turaga; Richard G. Baraniuk; Rama Chellappa

Compressive sensing (CS) is a new approach for the acquisition and recovery of sparse signals and images that enables sampling rates significantly below the classical Nyquist rate. Despite significant progress in the theory and methods of CS, little headway has been made in compressive video acquisition and recovery. Video CS is complicated by the ephemeral nature of dynamic events, which makes direct extensions of standard CS imaging architectures and signal models infeasible. In this paper, we develop a new framework for video CS for dynamic textured scenes that models the evolution of the scene as a linear dynamical system (LDS). This reduces the video recovery problem to first estimating the model parameters of the LDS from compressive measurements, from which the image frames are then reconstructed. We exploit the low-dimensional dynamic parameters (the state sequence) and high-dimensional static parameters (the observation matrix) of the LDS to devise a novel compressive measurement strategy that measures only the dynamic part of the scene at each instant and accumulates measurements over time to estimate the static parameters. This enables us to considerably lower the compressive measurement rate considerably. We validate our approach with a range of experiments including classification experiments that highlight the effectiveness of the proposed approach.

IEEE Transactions on Multimedia | 2008

A Constrained Probabilistic Petri Net Framework for Human Activity Detection in Video

Massimiliano Albanese; Rama Chellappa; Vincenzo Moscato; Antonio Picariello; V. S. Subrahmanian; Pavan K. Turaga; Octavian Udrea

Recognition of human activities in restricted settings such as airports, parking lots and banks is of significant interest in security and automated surveillance systems. In such settings, data is usually in the form of surveillance videos with wide variation in quality and granularity. Interpretation and identification of human activities requires an activity model that a) is rich enough to handle complex multi-agent interactions, b) is robust to uncertainty in low-level processing and c) can handle ambiguities in the unfolding of activities. We present a computational framework for human activity representation based on Petri nets. We propose an extension-Probabilistic Petri Nets (PPN)-and show how this model is well suited to address each of the above requirements in a wide variety of settings. We then focus on answering two types of questions: (i) what are the minimal sub-videos in which a given activity is identified with a probability above a certain threshold and (ii) for a given video, which activity from a given set occurred with the highest probability? We provide the PPN-MPS algorithm for the first problem, as well as two different algorithms (naive PPN-MPA and PPN-MPA) to solve the second. Our experimental results on a dataset consisting of bank surveillance videos and an unconstrained TSA tarmac surveillance dataset show that our algorithms are both fast and provide high quality results.

european conference on computer vision | 2010

Articulation-invariant representation of non-planar shapes

Raghuraman Gopalan; Pavan K. Turaga; Rama Chellappa

Given a set of points corresponding to a 2D projection of a non-planar shape, we would like to obtain a representation invariant to articulations (under no self-occlusions). It is a challenging problem since we need to account for the changes in 2D shape due to 3D articulations, viewpoint variations, as well as the varying effects of imaging process on different regions of the shape due to its non-planarity. By modeling an articulating shape as a combination of approximate convex parts connected by non-convex junctions, we propose to preserve distances between a pair of points by (i) estimating the parts of the shape through approximate convex decomposition, by introducing a robust measure of convexity and (ii) performing part-wise affine normalization by assuming a weak perspective camera model, and then relating the points using the inner distance which is insensitive to planar articulations. We demonstrate the effectiveness of our representation on a dataset with non-planar articulations, and on standard shape retrieval datasets like MPEG-7.

european conference on computer vision | 2012

Domain adaptive dictionary learning

Qiang Qiu; Vishal M. Patel; Pavan K. Turaga; Rama Chellappa

Many recent efforts have shown the effectiveness of dictionary learning methods in solving several computer vision problems. However, when designing dictionaries, training and testing domains may be different, due to different view points and illumination conditions. In this paper, we present a function learning framework for the task of transforming a dictionary learned from one visual domain to the other, while maintaining a domain-invariant sparse representation of a signal. Domain dictionaries are modeled by a linear or non-linear parametric function. The dictionary function parameters and domain-invariant sparse codes are then jointly learned by solving an optimization problem. Experiments on real datasets demonstrate the effectiveness of our approach for applications such as face recognition, pose alignment and pose estimation.

acm multimedia | 2008

An ontology based approach for activity recognition from video

Umut Akdemir; Pavan K. Turaga; Rama Chellappa

Representation and recognition of human activities is an important problem for video surveillance and security applications. Considering the wide variety of settings in which surveillance systems are being deployed, it is necessary to create a common knowledge-base or ontology of human activities. Most current attempts at ontology design in computer vision for human activities have been empirical in nature. In this paper, we present a more systematic approach to address the problem of designing ontologies for visual activity recognition. We draw on general ontology design principles and adapt them to the specific domain of human activity ontologies. Then, we discuss qualitative evaluation principles and provide several examples from existing ontologies and how they can be improved upon. Finally, we demonstrate quantitatively in terms of recognition performance, the efficacy and validity of our approach for bank and airport tarmac surveillance domains.

computer vision and pattern recognition | 2010

Moving vistas: Exploiting motion for describing scenes

Nitesh Shroff; Pavan K. Turaga; Rama Chellappa

Scene recognition in an unconstrained setting is an open and challenging problem with wide applications. In this paper, we study the role of scene dynamics for improved representation of scenes. We subsequently propose dynamic attributes which can be augmented with spatial attributes of a scene for semantically meaningful categorization of dynamic scenes. We further explore accurate and generalizable computational models for characterizing the dynamics of unconstrained scenes. The large intra-class variation due to unconstrained settings and the complex underlying physics present challenging problems in modeling scene dynamics. Motivated by these factors, we propose using the theory of chaotic systems to capture dynamics. Due to the lack of a suitable dataset, we compiled a dataset of ‘in-the-wild’ dynamic scenes. Experimental results show that the proposed framework leads to the best classification rate among other well-known dynamic modeling techniques. We also show how these dynamic features provide a means to describe dynamic scenes with motion-attributes, which then leads to meaningful organization of the video data.

Computer Vision and Image Understanding | 2009

Unsupervised view and rate invariant clustering of video sequences

Pavan K. Turaga; Ashok Veeraraghavan; Rama Chellappa

Videos play an ever increasing role in our everyday lives with applications ranging from news, entertainment, scientific research, security and surveillance. Coupled with the fact that cameras and storage media are becoming less expensive, it has resulted in people producing more video content than ever before. This necessitates the development of efficient indexing and retrieval algorithms for video data. Most state-of-the-art techniques index videos according to the global content in the scene such as color, texture, brightness, etc. In this paper, we discuss the problem of activity-based indexing of videos. To address the problem, first we describe activities as a cascade of dynamical systems which significantly enhances the expressive power of the model while retaining many of the computational advantages of using dynamical models. Second, we also derive methods to incorporate view and rate-invariance into these models so that similar actions are clustered together irrespective of the viewpoint or the rate of execution of the activity. We also derive algorithms to learn the model parameters from a video stream and demonstrate how a single video sequence may be clustered into different clusters where each cluster represents an activity. Experimental results for five different databases show that the clusters found by the algorithm correspond to semantically meaningful activities.

computer vision and pattern recognition | 2007

From Videos to Verbs: Mining Videos for Activities using a Cascade of Dynamical Systems

Pavan K. Turaga; Ashok Veeraraghavan; Rama Chellappa

Clustering video sequences in order to infer and extract activities from a single video stream is an extremely important problem and has significant potential in video indexing, surveillance, activity discovery and event recognition. Clustering a video sequence into activities requires one to simultaneously recognize activity boundaries (activity consistent subsequences) and cluster these activity subsequences. In order to do this, we build a generative model for activities (in video) using a cascade of dynamical systems and show that this model is able to capture and represent a diverse class of activities. We then derive algorithms to learn the model parameters from a video stream and also show how a single video sequence may be clustered into different clusters where each cluster represents an activity. We also propose a novel technique to build affine, view, rate invariance of the activity into the distance metric for clustering. Experiments show that the clusters found by the algorithm correspond to semantically meaningful activities.

Explore More