Atul Kanaujia | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Atul Kanaujia is active.

Explore More

Publication

Featured researches published by Atul Kanaujia.

computer vision and pattern recognition | 2005

Discriminative density propagation for 3D human motion estimation

Cristian Sminchisescu; Atul Kanaujia; Zhiguo Li; Dimitris N. Metaxas

We describe a mixture density propagation algorithm to estimate 3D human motion in monocular video sequences based on observations encoding the appearance of image silhouettes. Our approach is discriminative rather than generative, therefore it does not require the probabilistic inversion of a predictive observation model. Instead, it uses a large human motion capture data-base and a 3D computer graphics human model in order to synthesize training pairs of typical human configurations together with their realistically rendered 2D silhouettes. These are used to directly learn to predict the conditional state distributions required for 3D body pose tracking and thus avoid using the generative 3D model for inference (the learned discriminative predictors can also be used, complementary, as importance samplers in order to improve mixing or initialize generative inference algorithms). We aim for probabilistically motivated tracking algorithms and for models that can represent complex multivalued mappings common in inverse, uncertain perception inferences. Our paper has three contributions: (1) we establish the density propagation rules for discriminative inference in continuous, temporal chain models; (2) we propose flexible algorithms for learning multimodal state distributions based on compact, conditional Bayesian mixture of experts models; and (3) we demonstrate the algorithms empirically on real and motion capture-based test sequences and compare against nearest-neighbor and regression methods.

Computer Vision and Image Understanding | 2006

Conditional models for contextual human motion recognition

Cristian Sminchisescu; Atul Kanaujia; Dimitris N. Metaxas

We present algorithms for recognizing human motion in monocular video sequences, based on discriminative conditional random field (CRF) and maximum entropy Markov models (MEMM). Existing approaches to this problem typically use generative (joint) structures like the hidden Markov model (HMM). Therefore they have to make simplifying, often unrealistic assumptions on the conditional independence of observations given the motion class labels and cannot accommodate overlapping features or long term contextual dependencies in the observation sequence. In contrast, conditional models like the CRFs seamlessly represent contextual dependencies, support efficient, exact inference using dynamic programming, and their parameters can be trained using convex optimization. We introduce conditional graphical models as complementary tools for human motion recognition and present an extensive set of experiments that show how these typically outperform HMMs in classifying not only diverse human activities like walking, jumping. running, picking or dancing, but also for discriminating among subtle motion styles like normal walk and wander walk

computer vision and pattern recognition | 2006

Learning Joint Top-Down and Bottom-up Processes for 3D Visual Inference

Cristian Sminchisescu; Atul Kanaujia; Dimitris N. Metaxas

We present an algorithm for jointly learning a consistent bidirectional generative-recognition model that combines top-down and bottom-up processing for monocular 3d human motion reconstruction. Learning progresses in alternative stages of self-training that optimize the probability of the image evidence: the recognition model is tunned using samples from the generative model and the generative model is optimized to produce inferences close to the ones predicted by the current recognition model. At equilibrium, the two models are consistent. During on-line inference, we scan the image at multiple locations and predict 3d human poses using the recognition model. But this implicitly includes one-shot generative consistency feedback. The framework provides a uniform treatment of human detection, 3d initialization and 3d recovery from transient failure. Our experimental results show that this procedure is promising for the automatic reconstruction of human motion in more natural scene settings with background clutter and occlusion.

computer vision and pattern recognition | 2007

Semi-supervised Hierarchical Models for 3D Human Pose Reconstruction

Atul Kanaujia; Cristian Sminchisescu; Dimitris N. Metaxas

Recent research in visual inference from monocular images has shown that discriminatively trained image-based predictors can provide fast, automatic qualitative 3D reconstructions of human body pose or scene structure in real-world environments. However, the stability of existing image representations tends to be perturbed by deformations and misalignments in the training set, which, in turn, degrade the quality of learning and generalization. In this paper we advocate the semi-supervised learning of hierarchical image descriptions in order to better tolerate variability at multiple levels of detail. We combine multilevel encodings with improved stability to geometric transformations, with metric learning and semi-supervised manifold regularization methods in order to further profile them for task-invariance -resistance to background clutter and within the same human pose class differences. We quantitatively analyze the effectiveness of both descriptors and learning methods and show that each one can contribute, sometimes substantially, to more reliable 3D human pose estimates in cluttered images.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2007

BM³E : Discriminative Density Propagation for Visual Tracking

Cristian Sminchisescu; Atul Kanaujia; Dimitris N. Metaxas

We introduce BM3E, a conditional Bayesian mixture of experts Markov model, that achieves consistent probabilistic estimates for discriminative visual tracking. The model applies to problems of temporal and uncertain inference and represents the unexplored bottom-up counterpart of pervasive generative models estimated with Kalman filtering or particle filtering. Instead of inverting a nonlinear generative observation model at runtime, we learn to cooperatively predict complex state distributions directly from descriptors that encode image observations (typically, bag-of-feature global image histograms or descriptors computed over regular spatial grids). These are integrated in a conditional graphical model in order to enforce temporal smoothness constraints and allow a principled management of uncertainty. The algorithms combine sparsity, mixture modeling, and nonlinear dimensionality reduction for efficient computation in high-dimensional continuous state spaces. The combined system automatically self-initializes and recovers from failure. The research has three contributions: (1) we establish the density propagation rules for discriminative inference in continuous, temporal chain models, (2) we propose flexible supervised and unsupervised algorithms to learn feed-forward, multivalued contextual mappings (multimodal state distributions) based on compact, conditional Bayesian mixture of experts models, and (3) we validate the framework empirically for the reconstruction of 3D human motion in monocular video sequences. Our tests on both real and motion-capture-based sequences show significant performance gains with respect to competing nearest neighbor, regression, and structured prediction methods.

computer vision and pattern recognition | 2008

Fast algorithms for large scale conditional 3D prediction

Liefeng Bo; Cristian Sminchisescu; Atul Kanaujia; Dimitris N. Metaxas

The potential success of discriminative learning approaches to 3D reconstruction relies on the ability to efficiently train predictive algorithms using sufficiently many examples that are representative of the typical configurations encountered in the application domain. Recent research indicates that sparse conditional Bayesian mixture of experts (cMoE) models (e.g. BME (Sminchisescu et al., 2005)) are adequate modeling tools that not only provide contextual 3D predictions for problems like human pose reconstruction, but can also represent multiple interpretations that result from depth ambiguities or occlusion. However, training conditional predictors requires sophisticated double-loop algorithms that scale unfavorably with the input dimension and the training set size, thus limiting their usage to 10,000 examples of less, so far. In this paper we present large-scale algorithms, referred to as fBME, that combine forward feature selection and bound optimization in order to train probabilistic, BME models, with one order of magnitude more data (100,000 examples and up) and more than one order of magnitude faster. We present several large scale experiments, including monocular evaluation on the HumanEva dataset (Sigal and Black, 2006), demonstrating how the proposed methods overcome the scaling limitations of existing ones.

international conference on computer vision | 2007

The Best of Both Worlds: Combining 3D Deformable Models with Active Shape Models

Christian Vogler; Zhiguo Li; Atul Kanaujia; Siome Goldenstein; Dimitris N. Metaxas

Reliable 3D tracking is still a difficult task. Most parametrized 3D deformable models rely on the accurate extraction of image features for updating their parameters, and are prone to failures when the underlying feature distribution assumptions are invalid. Active Shape Models (ASMs), on the other hand, are based on learning, and thus require fewer reliable local image features than parametrized 3D models, but fail easily when they encounter a situation for which they were not trained. In this paper, we develop an integrated framework that combines the strengths of both 3D deformable models and ASMs. The 3D model governs the overall shape, orientation and location, and provides the basis for statistical inference on both the image features and the parameters. The ASMs, in contrast, provide the majority of reliable 2D image features over time, and aid in recovering from drift and total occlusions. The framework dynamically selects among different ASMs to compensate for large viewpoint changes due to head rotations. This integration allows the robust tracking effaces and the estimation of both their rigid and non- rigid motions. We demonstrate the strength of the framework in experiments that include automated 3D model fitting and facial expression tracking for a variety of applications, including sign language.

international conference on computer vision | 2007

Spectral Latent Variable Models for Perceptual Inference

Atul Kanaujia; Cristian Sminchisescu; Dimitris N. Metaxas

We propose non-linear generative models referred to as Sparse Spectral Latent Variable Models (SLVM), that combine the advantages of spectral embeddings with the ones of parametric latent variable models: (1) provide stable latent spaces that preserve global or local geometric properties of the modeled data; (2) offer low-dimensional generative models with probabilistic, bi-directional mappings between latent and ambient spaces, (3) are probabilistically consistent (i.e., reflect the data distribution, both jointly and marginally) and efficient to learn and use. We show that SLVMs compare favorably with competing methods based on PCA, GPLVM or GTM for the reconstruction of typical human motions like walking, running, pantomime or dancing in a benchmark dataset. Empirically, we observe that SLVMs are effective for the automatic 3d reconstruction of low-dimensional human motion in movies.

international conference on image processing | 2007

Large Scale Learning of Active Shape Models

Atul Kanaujia; Dimitris N. Metaxas

We propose a framework to learn statistical shape models for faces as piecewise linear models. Specifically, our methodology builds upon primitive active shape models(ASM) to handle large scale variation in shapes and appearances of faces. Non-linearities in shape manifold arising due to large head rotation cannot be accurately modeled using ASM. Moreover overly general image descriptor causes the cost function to have multiple local minima which in turn degrades the quality of shape registration. We propose to use multiple overlapping subspaces with more discriminative local image descriptors to capture larger variance occurring in the data set. We also apply techniques to learn distance metric for enhancing similarity of descriptors belonging to the same class of shape subspace. Our generic algorithm can be applied to large scale shape analysis and registration.

computer vision and pattern recognition | 2006

Emblem Detections by Tracking Facial Features

Atul Kanaujia; Yuchi Huang; Dimitris N. Metaxas

Tracking facial features across large head rotations is a challenging research problem. Both 2D and 3D model based approaches have been proposed for feature analysis from multiple views. Accurate feature tracking enables useful video processing applications like emblem detection(an event or movement that symbolizes an idea), facial expressions recognition, morphing and synthesis. A crucial requirement is generalizability of the tracking framework across appearance variations, presence of facial hair and illumination changes. We propose a framework to detect emblems that combines active shape model with a predictive face aspect model to track features across large head movements and runs close to real time. Active Shape Model(ASM) is a deformable model for shape registration that detect facial features by combining prior shape information with the observed image data. Our view based framework represents various head poses by multiple 2D shape models and accounts for large head rotations by dynamically switching between them. Our switching variable (the current model to use) is discriminatively predicted from the SIFT descriptors computed over the bounding box of low resolution face image. We demonstrate the use of tracking framework to recognize high level events like head nodding, shaking and eye blinking.

Explore More