Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where P. Kakumanu is active.

Publication


Featured researches published by P. Kakumanu.


Pattern Recognition | 2007

A survey of skin-color modeling and detection methods

P. Kakumanu; Sokratis Makrogiannis; Nikolaos G. Bourbakis

Skin detection plays an important role in a wide range of image processing applications ranging from face detection, face tracking, gesture analysis, content-based image retrieval systems and to various human computer interaction domains. Recently, skin detection methodologies based on skin-color information as a cue has gained much attention as skin-color provides computationally effective yet, robust information against rotations, scaling and partial occlusions. Skin detection using color information can be a challenging task as the skin appearance in images is affected by various factors such as illumination, background, camera characteristics, and ethnicity. Numerous techniques are presented in literature for skin detection using color. In this paper, we provide a critical up-to-date review of the various skin modeling and classification strategies based on color information in the visual spectrum. The review is divided into three different categories: first, we present the various color spaces used for skin modeling and detection. Second, we present different skin modeling and classification approaches. However, many of these works are limited in performance due to real-world conditions such as illumination and viewing conditions. To cope up with the rapidly changing illumination conditions, illumination adaptation techniques are applied along with skin-color detection. Third, we present various approaches that use skin-color constancy and dynamic adaptation techniques to improve the skin detection performance in dynamically changing illumination and environmental conditions. Wherever available, we also indicate the various factors under which the skin detection techniques perform well.


IEEE Transactions on Multimedia | 2005

Speech-driven facial animation with realistic dynamics

Ricardo Gutierrez-Osuna; P. Kakumanu; Anna Esposito; Oscar N. Garcia; Adriana Bojórquez; José Luis Castillo; Isaac Rudomin

This work presents an integral system capable of generating animations with realistic dynamics, including the individualized nuances, of three-dimensional (3-D) human faces driven by speech acoustics. The system is capable of capturing short phenomena in the orofacial dynamics of a given speaker by tracking the 3-D location of various MPEG-4 facial points through stereovision. A perceptual transformation of the speech spectral envelope and prosodic cues are combined into an acoustic feature vector to predict 3-D orofacial dynamics by means of a nearest-neighbor algorithm. The Karhunen-Loe/spl acute/ve transformation is used to identify the principal components of orofacial motion, decoupling perceptually natural components from experimental noise. We also present a highly optimized MPEG-4 compliant player capable of generating audio-synchronized animations at 60 frames/s. The player is based on a pseudo-muscle model augmented with a nonpenetrable ellipsoidal structure to approximate the skull and the jaw. This structure adds a sense of volume that provides more realistic dynamics than existing simplified pseudo-muscle-based approaches, yet it is simple enough to work at the desired frame rate. Experimental results on an audiovisual database of compact TIMIT sentences are presented to illustrate the performance of the complete system.


IEEE Transactions on Multimedia | 2005

Audio/visual mapping with cross-modal hidden Markov models

Shengli Fu; Ricardo Gutierrez-Osuna; Anna Esposito; P. Kakumanu; Oscar N. Garcia

The audio/visual mapping problem of speech-driven facial animation has intrigued researchers for years. Recent research efforts have demonstrated that hidden Markov model (HMM) techniques, which have been applied successfully to the problem of speech recognition, could achieve a similar level of success in audio/visual mapping problems. A number of HMM-based methods have been proposed and shown to be effective by the respective designers, but it is yet unclear how these techniques compare to each other on a common test bed. In this paper, we quantitatively compare three recently proposed cross-modal HMM methods, namely the remapping HMM (R-HMM), the least-mean-squared HMM (LMS-HMM), and HMM inversion (HMMI). The objective of our comparison is not only to highlight the merits and demerits of different mapping designs, but also to study the optimality of the acoustic representation and HMM structure for the purpose of speech-driven facial animation. This paper presents a brief overview of these models, followed by an analysis of their mapping capabilities on a synthetic dataset. An empirical comparison on an experimental audio-visual dataset consisting of 75 TIMIT sentences is finally presented. Our results show that HMMI provides the best performance, both on synthetic and experimental audio-visual data.


International Journal of Neural Systems | 2007

NEURAL NETWORK APPROACH FOR IMAGE CHROMATIC ADAPTATION FOR SKIN COLOR DETECTION

Nikolaos G. Bourbakis; P. Kakumanu; Sokratis Makrogiannis; Robert K. Bryll; Sethuraman Panchanathan

The goal of image chromatic adaptation is to remove the effect of illumination and to obtain color data that reflects precisely the physical contents of the scene. We present in this paper an approach to image chromatic adaptation using Neural Networks (NN) with application for detecting--adapting human skin color. The NN is trained on randomly chosen color images containing human subject under various illuminating conditions, thereby enabling the model to dynamically adapt to the changing illumination conditions. The proposed network predicts directly the illuminant estimate in the image so as to adapt to human skin color. The comparison of our method with Gray World, White Patch and NN on White Patch methods for skin color stabilization is presented. The skin regions in the NN stabilized images are successfully detected using a computationally inexpensive thresholding operation. We also present results on detecting skin regions on a data set of test images. The results are promising and suggest a new approach for adapting human skin color using neural networks.


workshop on perceptive user interfaces | 2001

Speech driven facial animation

P. Kakumanu; Ricardo Gutierrez-Osuna; Anna Esposito; Robert K. Bryll; A. Ardeshir Goshtasby; Oscar N. Garcia

The results reported in this article are an integral part of a larger project aimed at achieving perceptually realistic animations, including the individualized nuances, of three-dimensional human faces driven by speech. The audiovisual system that has been developed for learning the spatio-temporal relationship between speech acoustics and facial animation is described, including video and speech processing, pattern analysis, and MPEG-4 compliant facial animation for a given speaker. In particular, we propose a perceptual transformation of the speech spectral envelope, which is shown to capture the dynamics of articulatory movements. An efficient nearest-neighbor algorithm is used to predict novel articulatory trajectories from the speech dynamics. The results are very promising and suggest a new way to approach the modeling of synthetic lip motion of a given speaker driven by his/her speech. This would also provide clues toward a more general cross-speaker realistic animation.


Speech Communication | 2006

A comparison of acoustic coding models for speech-driven facial animation

P. Kakumanu; Anna Esposito; Oscar N. Garcia; Ricardo Gutierrez-Osuna

This article presents a thorough experimental comparison of several acoustic modeling techniques by their ability to capture information related to orofacial motion. These models include (1) Linear Predictive Coding and Linear Spectral Frequencies, which model the dynamics of the speech production system, (2) Mel Frequency Cepstral Coefficients and Perceptual Critical Feature Bands, which encode perceptual cues of speech, (3) spectral energy and fundamental frequency, which capture prosodic aspects, and (4) two hybrid methods that combine information from the previous models. We also consider a novel supervised procedure based on Fishers Linear Discriminants to project acoustic information onto a lowdimensional subspace that best discriminates different orofacial configurations. Prediction of orofacial motion from speech acoustics is performed using a non-parametric k-nearest-neighbors procedure. The sensitivity of this audio–visual mapping to coarticulation effects and spatial locality is thoroughly investigated. Our results indicate that the hybrid use of articulatory, perceptual and prosodic features of speech, combined with a supervised dimensionality-reduction procedure, is able to outperform any individual acoustic model for speech-driven facial animation. These results are validated on the 450 sentences of the TIMIT compact dataset. � 2005 Elsevier B.V. All rights reserved.


international conference on tools with artificial intelligence | 2006

A Local-Global Graph Approach for Facial Expression Recognition

P. Kakumanu; Nikolaos G. Bourbakis

In this article, we present a local global graph (LGG) method for recognizing facial expressions from static images irrespective of different illumination conditions, shadows and cluttered backgrounds. First, a neural color constancy based skin detection procedure to detect skin in complex real world images is presented. Second, the LGG method for detecting faces and facial expressions with a maximum confidence from skin segmented images is presented. The LGG approach presented here emulates the human visual perception for face and expression detection. In general, humans first extract the most important facial features such as eyes, nose, mouth, etc. and then inter-relate them for face and facial expression representations. The LG Graph embeds both the local information (the shape of facial feature is stored within the local graph at each node) and the global information (the topology of the face). Facial expression recognition from the detected face images is obtained by comparing the LG Expression Graphs with the existing the LG expression models present in the LGG database. Experimental results on the AR database and real-world images suggest the robustness of the proposed approach for facial expression recognition


Applied Pattern Recognition | 2008

Skin-based Face Detection-Extraction and Recognition of Facial Expressions

Nikolaos G. Bourbakis; P. Kakumanu

Face detection is the foremost task in building vision-based humancomputer interaction systems and in particular in applications such as face recognition, face identification, face tracking, expression recognition and content based image retrieval. A robust face detection system must be able to detect faces irrespective of illuminations, shadows, cluttered backgrounds, facial pose, orientation and facial expressions. Many approaches for face detection have been proposed. However, as revealed by FRVT 2002 tests, face detection in outdoor images with uncontrolled illumination and in images with varied pose (non-frontal profile views) is still a serious problem. In this chapter, we describe a Local-Global Graph (LGG) based method for detecting faces and for recognizing facial expressions accurately in real world image capturing conditions both indoor and outdoor, and with a variety of illuminations (shadows, high-lights, non-white lights) and in cluttered backgrounds. The LG Graph embeds both the local information (the shape of facial feature is stored within the local graph at each node) and the global information (the topology of the face). The LGG approach for detecting faces with maximum confidence from skin segmented images is described. The LGG approach presented here emulates the human visual perception for face detection. In general, humans first extract the most important facial features such as eyes, nose, mouth, etc. and then inter-relate them for face and facial expression representations. Facial expression recognition from the detected face images is obtained by comparing the LG Expression Graphs with the existing the Expression models present in the LGG database. The methodology is accurate for the expression models present in the database.


international conference on tools with artificial intelligence | 2006

Document Image Dewarping Based on Line Estimation for Visually Impaired

P. Kakumanu; Nikolaos G. Bourbakis; John A. Black; Sethuraman Panchanathan

In this paper, we present an image-text dewarping methodology based on robust estimation of text-lines. When a text-page is captured by a camera, it suffers both from the perspective distortions and the page curl. The non-linear distortion due to page-curl is inherently present, given the surface nature of the pages and the text-book. The state-of-the-art OCR systems have a very low performance on recognizing such distorted text. To remove both these distortions and to produce a flattened view of the text, we use the cues present in the image-text, i.e., the text-lines on the surface of the page are straight. The methodology requires only a single camera captured image and does not require any calibration or any other expensive hardware setups as in other methods. Experimental results on a set of documents show that the methodology produces visually pleasing output and also improves OCR accuracy


International Journal on Artificial Intelligence Tools | 2009

A WEARABLE DOCUMENT READER FOR THE VISUALLY IMPAIRED: DEWARPING AND SEGMENTATION

Robert Keefer; P. Kakumanu; Nikolaos G. Bourbakis

While reading devices for the visually impaired have been available for many years, they are often expensive and difficult to use. The image processing required to enable the reading task is a composition of several important sub-tasks, such as image capture, image stabilization, image enhancement and page-curl dewarping region segmentation, regions grouping, and word recognition In this paper we deal with some of these sub-tasks in an effort to prototype a device (Tyflos-reader) that will read a document for a person with a visual impairment and respond to voice commands for control. Initial experimental results on a set of textbook and newspaper pages are also presented.

Collaboration


Dive into the P. Kakumanu's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Anna Esposito

Seconda Università degli Studi di Napoli

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Shengli Fu

University of North Texas

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

John A. Black

Arizona State University

View shared research outputs
Researchain Logo
Decentralizing Knowledge