Avinash Ravichandran
University of California, Los Angeles
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Avinash Ravichandran.
computer vision and pattern recognition | 2009
Rizwan Chaudhry; Avinash Ravichandran; Gregory D. Hager; René Vidal
System theoretic approaches to action recognition model the dynamics of a scene with linear dynamical systems (LDSs) and perform classification using metrics on the space of LDSs, e.g. Binet-Cauchy kernels. However, such approaches are only applicable to time series data living in a Euclidean space, e.g. joint trajectories extracted from motion capture data or feature point trajectories extracted from video. Much of the success of recent object recognition techniques relies on the use of more complex feature descriptors, such as SIFT descriptors or HOG descriptors, which are essentially histograms. Since histograms live in a non-Euclidean space, we can no longer model their temporal evolution with LDSs, nor can we classify them using a metric for LDSs. In this paper, we propose to represent each frame of a video using a histogram of oriented optical flow (HOOF) and to recognize human actions by classifying HOOF time-series. For this purpose, we propose a generalization of the Binet-Cauchy kernels to nonlinear dynamical systems (NLDS) whose output lives in a non-Euclidean space, e.g. the space of histograms. This can be achieved by using kernels defined on the original non-Euclidean space, leading to a well-defined metric for NLDSs. We use these kernels for the classification of actions in video sequences using (HOOF) as the output of the NLDS. We evaluate our approach to recognition of human actions in several scenarios and achieve encouraging results.
computer vision and pattern recognition | 2011
Paolo Favaro; René Vidal; Avinash Ravichandran
We consider the problem of fitting one or more subspaces to a collection of data points drawn from the subspaces and corrupted by noise/outliers. We pose this problem as a rank minimization problem, where the goal is to decompose the corrupted data matrix as the sum of a clean, self-expressive, low-rank dictionary plus a matrix of noise/outliers. Our key contribution is to show that, for noisy data, this non-convex problem can be solved very efficiently and in closed form from the SVD of the noisy data matrix. Remarkably, this is true for both one or more subspaces. An important difference with respect to existing methods is that our framework results in a polynomial thresholding of the singular values with minimal shrinkage. Indeed, a particular case of our framework in the case of a single subspace leads to classical PCA, which requires no shrinkage. In the case of multiple subspaces, our framework provides an affinity matrix that can be used to cluster the data according to the sub-spaces. In the case of data corrupted by outliers, a closed-form solution appears elusive. We thus use an augmented Lagrangian optimization framework, which requires a combination of our proposed polynomial thresholding operator with the more traditional shrinkage-thresholding operator.
computer vision and pattern recognition | 2005
René Vidal; Avinash Ravichandran
We consider the problem of modeling a scene containing multiple dynamic textures undergoing multiple rigid-body motions, e.g., a video sequence of water taken by a rigidly moving camera. We propose to model each moving dynamic texture with a time varying linear dynamical system (LDS) plus a 2D translational motion model. We first consider a scene with a single moving dynamic texture and show how to simultaneously learn the parameters of the time varying LDS as well as the optical flow of the scene using the so-called dynamic texture constancy constraint (DTCC). We then consider a scene with multiple non-moving dynamic textures and show that learning the parameters of each time invariant LDS as well as their region of support is equivalent to clustering data living in multiple subspaces. We solve this problem with a combination of PCA and GPCA. Finally, we consider a scene with multiple moving dynamic textures, and show how to simultaneously learn the parameters of multiple time varying LDS and multiple 2D translational models, by clustering data living in multiple dynamically evolving subspaces. We test our approach on sequences of flowers, water, grass, and a beating heart.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2013
Avinash Ravichandran; Rizwan Chaudhry; René Vidal
We consider the problem of categorizing video sequences of dynamic textures, i.e., nonrigid dynamical objects such as fire, water, steam, flags, etc. This problem is extremely challenging because the shape and appearance of a dynamic texture continuously change as a function of time. State-of-the-art dynamic texture categorization methods have been successful at classifying videos taken from the same viewpoint and scale by using a Linear Dynamical System (LDS) to model each video, and using distances or kernels in the space of LDSs to classify the videos. However, these methods perform poorly when the video sequences are taken under a different viewpoint or scale. In this paper, we propose a novel dynamic texture categorization framework that can handle such changes. We model each video sequence with a collection of LDSs, each one describing a small spatiotemporal patch extracted from the video. This Bag-of-Systems (BoS) representation is analogous to the Bag-of-Features (BoF) representation for object recognition, except that we use LDSs as feature descriptors. This choice poses several technical challenges in adopting the traditional BoF approach. Most notably, the space of LDSs is not euclidean; hence, novel methods for clustering LDSs and computing codewords of LDSs need to be developed. We propose a framework that makes use of nonlinear dimensionality reduction and clustering techniques combined with the Martin distance for LDSs to tackle these issues. Our experiments compare the proposed BoS approach to existing dynamic texture categorization methods and show that it can be used for recognizing dynamic textures in challenging scenarios which could not be handled by existing methods.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2011
Avinash Ravichandran; René Vidal
We consider the problem of spatially and temporally registering multiple video sequences of dynamical scenes which contain, but are not limited to, nonrigid objects such as fireworks, flags fluttering in the wind, etc., taken from different vantage points. This problem is extremely challenging due to the presence of complex variations in the appearance of such dynamic scenes. In this paper, we propose a simple algorithm for matching such complex scenes. Our algorithm does not require the cameras to be synchronized, and is not based on frame-by-frame or volume-by-volume registration. Instead, we model each video as the output of a linear dynamical system and transform the task of registering the video sequences to that of registering the parameters of the corresponding dynamical models. As these parameters are not uniquely defined, one cannot directly compare them to perform registration. We resolve these ambiguities by jointly identifying the parameters from multiple video sequences, and converting the identified parameters to a canonical form. This reduces the video registration problem to a multiple image registration problem, which can be efficiently solved using existing image matching techniques. We test our algorithm on a wide variety of challenging video sequences and show that it matches the performance of significantly more computationally expensive existing methods.
computer vision and pattern recognition | 2012
Bijan Afsari; Rizwan Chaudhry; Avinash Ravichandran; René Vidal
We introduce a framework for defining a distance on the (non-Euclidean) space of Linear Dynamical Systems (LDSs). The proposed distance is induced by the action of the group of orthogonal matrices on the space of statespace realizations of LDSs. This distance can be efficiently computed for large-scale problems, hence it is suitable for applications in the analysis of dynamic visual scenes and other high dimensional time series. Based on this distance we devise a simple LDS averaging algorithm, which can be used for classification and clustering of time-series data. We test the validity as well as the performance of our group-action based distance on synthetic as well as real data and provide comparison with state-of-the-art methods.
asian conference on computer vision | 2010
Avinash Ravichandran; Paolo Favaro; René Vidal
Dynamic textures (DT) are videos of non-rigid dynamical objects, such as fire and waves, which constantly change their shape and appearance over time. Most of the prior work on DT analysis dealt with the classification of videos of a single DT or the segmentation of videos containing multiple DTs. In this paper, we consider the problem of joint segmentation and categorization of videos of multiple DTs under varying viewpoint, scale, and illumination conditions. We formulate this problem of assigning a class label to each pixel in the video as the minimization of an energy functional composed of two terms. The first term measures the cost of assigning a DT category to each pixel. For this purpose, we introduce a bag of dynamic appearance features (BoDAF) approach, in which we fit each video with a linear dynamical system (LDS) and use features extracted from the parameters of the LDS for classification. This BoDAF approach can be applied to the whole video, thus providing a framework for classifying videos of a single DT, or to image patches (superpixels), thus providing the cost of assigning a DT category to each pixel. The second term is a spatial regularization cost that encourages nearby pixels to have the same label. The minimization of this energy functional is carried out using the random walker algorithm. Experiments on existing databases of a single DT demonstrate the superiority of our BoDAF approach with respect to state-of-the art methods. To the best of our knowledge, the problem of joint segmentation and categorization of videos of multiple DTs has not been addressed before, hence there is no standard database to test our method. We therefore introduce a new database of videos annotated at the pixel level and evaluate our approach on this database with promising results.
european conference on computer vision | 2012
Avinash Ravichandran; Stefano Soatto
We describe a methodology for modeling backgrounds subject to significant variability over time-scales ranging from days to years, where the events of interest exhibit subtle variability relative to the normal mode. The motivating application is fire monitoring from remote stations, where illumination changes spanning the day and the season, meteorological phenomena resembling smoke, and the absence of sufficient training data for the two classes make out-of-the-box classification algorithms ineffective. We exploit low-level descriptors, incorporate explicit modeling of nuisance variability, and learn the residual normal-model variability. Our algorithm achieves state-of-the-art performance not only compared to other anomaly detection schemes, but also compared to human performance, both for untrained and trained operators.
energy minimization methods in computer vision and pattern recognition | 2013
Brian D. Taylor; Alper Ayvaci; Avinash Ravichandran; Stefano Soatto
We describe an approach to incorporate scene topology and semantics into pixel-level object detection and localization. Our method requires video to determine occlusion regions and thence local depth ordering, and any visual recognition scheme that provides a score at local image regions, for instance object detection probabilities. We set up a cost functional that incorporates occlusion cues induced by object boundaries, label consistency and recognition priors, and solve it using a convex optimization scheme. We show that our method improves localization accuracy of existing recognition approaches, or equivalently provides semantic labels to pixel-level localization and segmentation.
british machine vision conference | 2012
Daniele Perrone; Avinash Ravichandran; René Vidal; Paolo Favaro
We consider the problem of non-blind deconvolution of images corrupted by a blur that is not accurately known. We propose a method that exploits dictionary-based image priors and non Gaussian noise models to improve deblurring accuracy in the presence of an inexact blur. The proposed image priors express each image patch as a linear combination of atoms from a dictionary learned from patches extracted from the same image or from an image database. When applied to blurred images, this model imposes that patches that are similar in the blurred image retain the same similarity when deblurred. We perform image deblurring by imposing this prior model in an energy minimization scheme that also deals with outliers. Experimental results on publicly available databases show that our approach is able to remove artifacts such as oscillations, which are often introduced during the deblurring process when the correct blur is not known.