P. Anandan
Microsoft
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by P. Anandan.
Proceedings of the IEEE | 1998
Michal Irani; P. Anandan
Video is a rich source of information. It provides visual information about scenes. This information is implicitly buried inside the raw video data, however, and is provided with the cost of very high temporal redundancy. While the standard sequential form of video storage is adequate for viewing in a movie mode, it fails to support rapid access to information of interest that is required in many of the emerging applications of video. This paper presents an approach for efficient access, use and manipulation of video data. The video data are first transformed from their sequential and redundant frame-based representation, in which the information about the scene is distributed over many frames, to an explicit and compact scene-based representation, to which each frame can be directly related. This compact reorganization of the video data supports nonlinear browsing and efficient indexing to provide rapid access directly to information of interest. This paper describes a new set of methods for indexing into the video sequence based on the scene-based representation. These indexing methods are based on geometric and dynamic information contained in the video. These methods complement the more traditional content-based indexing methods, which utilize image appearance information (namely, color and texture properties) but are considerably simpler to achieve and are highly computationally efficient.
international conference on computer vision | 1998
Michal Irani; P. Anandan
This paper presents a method for alignment of images acquired by sensors of different modalities (e.g., EO and IR). The paper has two main contributions: (i) It identifies an appropriate image representation, for multi-sensor alignment, i.e., a representation which emphasizes the common information between the two multi-sensor images, suppresses the non-common information, and is adequate for coarse-to-fine processing. (ii) It presents a new alignment technique which applies global estimation to any choice of a local similarity measure. In particular, it is shown that when this registration technique is applied to the chosen image representation with a local normalized-correlation similarity measure, it provides a new multi-sensor alignment algorithm which is robust to outliers, and applies to a wide variety of globally complex brightness transformations between the two images. Our proposed image representation does not rely on sparse image features (e.g., edge, contour, or point features). It is continuous and does not eliminate the detailed variations within local image regions. Our method naturally extends to coarse-to-fine processing, and applies even in situations when the multi-sensor signals are globally characterized by low statistical correlation.
computer vision and pattern recognition | 1998
Simon Baker; Richard Szeliski; P. Anandan
We propose a framework for extracting structure from stereo which represents the scene as a collection of approximately planar layers. Each layer consists of an explicit 3D plane equation, a colored image with per-pixel opacity (a sprite), and a per-pixel depth offset relative to the plane. Initial estimates of the layers are recovered using techniques taken from parametric motion estimation. These initial estimates are then refined using a re-synthesis algorithm which takes into account both occlusions and mixed pixels. Reasoning about such effects allows the recovery of depth and color information with high accuracy even in partially occluded regions. Another important benefit of our framework is that the output consists of a collection of approximately planar regions, a representation which is far more appropriate than a dense depth map for many applications such as rendering and video parsing.
international conference on pattern recognition | 1994
Rakesh Kumar; P. Anandan; Keith J. Hanna
Given two arbitrary views of a scene under central projection, if the motion of points on a parametric surface is compensated, the residual parallax displacement field on the reference image is an epipolar field. If the surface aligned is a plane, the parallax magnitude at an image point is directly proportional to the height of the point from the plane and inversely proportional to its depth from the camera. The authors exploit the above theorem to infer 3D height information from oblique aerial 2D images. The authors use direct methods to register the aerial images and develop methods to infer height information under the following three conditions: (i) focal length and image center are both known, (ii) only the focal length is known, and (iii) both are unknown.
computer vision and pattern recognition | 2000
Richard Szeliski; Shai Avidan; P. Anandan
Many natural images contain reflections and transparency, i.e., they contain mixtures of reflected and transmitted light. When viewed from a moving camera, these appear as the superposition of component layer images moving relative to each other. The problem of multiple motion recovery has been previously studied by a number of researchers. However no one has yet demonstrated how to accurately recover the component images themselves. In this paper we develop an optimal approach to recovering layer images and their associated motions from an arbitrary number of composite images. We develop two different techniques for estimating the component layer images given known motion estimates. The first approach uses constrained least squares to recover the layer images. The second approach iteratively refines lower and upper bounds on the layer images using two novel compositing operations, namely minimum- and maximum-composites of aligned images. We combine these layer extraction techniques with a dominant motion estimator and a subsequent motion refinement stage. This results in a completely automated system that recovers transparent images and motions from a collection of input images.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2001
Philip H. S. Torr; Richard Szeliski; P. Anandan
This paper describes a Bayesian approach for modeling 3D scenes as collection of approximately planar layers that are arbitrarily positioned and oriented in the scene. In contrast to much of the previous work on layer-based motion modeling, which computes layered descriptions of 2D image motion, our work leads to a 3D description of the scene. There are two contributions within the paper. The first is to formulate the prior assumptions about the layers and scene within a Bayesian decision making framework which is used to automatically determine the number of layers and the assignment of individual pixels to layers. The second is algorithmic. In order to achieve the optimization, a Bayesian version of RANSAC is developed with which to initialize the segmentation. Then, a generalized expectation maximization method is used to find the MAP solution.
Computer Vision and Image Understanding | 2005
Antonio Criminisi; Sing Bing Kang; Rahul Swaminathan; Richard Szeliski; P. Anandan
Despite progress in stereo reconstruction and structure from motion, 3D scene reconstruction from multiple images still faces many difficulties, especially in dealing with occlusions, partial visibility, textureless regions, and specular reflections. Moreover, the problem of recovering a spatially dense 3D representation from many views has not been adequately treated. This document addresses the problems of achieving a dense reconstruction from a sequence of images and analyzing and removing specular highlights. The first part describes an approach for automatically decomposing the scene into a set of spatio-temporal layers (namely EPI-tubes) by analyzing the epipolar plane image (EPI) volume. The key to our approach is to directly exploit the high degree of regularity found in the EPI volume. In contrast to past work on EPI volumes that focused on a sparse set of feature tracks, we develop a complete and dense segmentation of the EPI volume. Two different algorithms are presented to segment the input EPI volume into its component EPI tubes. The second part describes a mathematical characterization of specular reflections within the EPI framework and proposes a novel technique for decomposing a static scene into its diffuse (Lambertian) and specular components. Furthermore, a taxonomy of specularities based on their photometric properties is presented as a guide for designing further separation techniques. The validity of our approach is demonstrated on a number of sequences of complex scenes with large amounts of occlusions and specularity. In particular, we demonstrate object removal and insertion, depth map estimation, and detection and removal of specular highlights.
computer vision and pattern recognition | 2000
Yong Rui; P. Anandan
The analysis of human action captured in video sequences has been a topic of considerable interest in computer vision. Much of the previous work has focused on the problem of action or activity recognition, but ignored the problem of detecting action boundaries in a video sequence containing unfamiliar and arbitrary visual actions. This paper presents an approach to this problem based on detecting temporal discontinuities of the spatial pattern of image motion that captures the action. We represent frame to frame optical-flow in terms of the coefficients of the most significant principal components computed from all the flow-fields within a given video sequence. We then detect the discontinuities in the temporal trajectories of these coefficients based on three different measures. We compare our segment boundaries against those detected by human observers on the same sequences in a recent independent psychological study of human perception of visual events. We show experimental results on the two sequences that were used in this study. Our experimental results are promising both from visual evaluation and when compared against the results of the psychological study.
european conference on computer vision | 2000
Michal Irani; P. Anandan
Factorization using Singular Value Decomposition (SVD) is often used for recovering 3D shape and motion from feature correspondences across multiple views. SVD is powerful at finding the global solution to the associated least-square-error minimization problem. However, this is the correct error to minimize only when the x and y positional errors in the features are uncorrelated and identically distributed. But this is rarely the case in real data. Uncertainty in feature position depends on the underlying spatial intensity structure in the image, which has strong directionality to it. Hence, the proper measure to minimize is covariance-weighted squared-error (or the Mahalanobis distance). In this paper, we describe a new approach to covariance-weighted factorization, which can factor noisy feature correspondences with high degree of directional uncertainty into structure and motion. Our approach is based on transforming the raw-data into a covariance-weighted data space, where the components of noise in the different directions are uncorrelated and identically distributed. Applying SVD to the transformed data now minimizes a meaningful objective function. We empirically show that our new algorithm gives good results for varying degrees of directional uncertainty. In particular, we show that unlike other SVD-based factorization algorithms, our method does not degrade with increase in directionality of uncertainty, even in the extreme when only normal-flow data is available. It thus provides a unified approach for treating corner-like points together with points along linear structures in the image.
european conference on computer vision | 1998
Michal Irani; P. Anandan; Daphna Weinshall
This paper presents a new framework for analyzing the geometry of multiple 3D scene points from multiple uncalibrated images, based on decomposing the projection of these points on the images into two stages: (i) the projection of the scene points onto a (real or virtual) physical reference planar surface in the scene; this creates a virtual “image” on the reference plane, and (ii) the re-projection of the virtual image onto the actual image plane of the camera. The positions of the virtual image points are directly related to the 3D locations of the scene points and the camera centers relative to the reference plane alone. All dependency on the internal camera calibration parameters and the orientation of the camera are folded into homographies relating each image plane to the reference plane.