Utpala Musti | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Utpala Musti is active.

Explore More

Publication

Featured researches published by Utpala Musti.

Eurasip Journal on Audio, Speech, and Music Processing | 2013

Acoustic-visual synthesis technique using bimodal unit-selection

Slim Ouni; Vincent Colotte; Utpala Musti; Asterios Toutios; Brigitte Wrobel-Dautcourt; Marie-Odile Berger; Caroline Lavecchia

This paper presents a bimodal acoustic-visual synthesis technique that concurrently generates the acoustic speech signal and a 3D animation of the speaker’s outer face. This is done by concatenating bimodal diphone units that consist of both acoustic and visual information. In the visual domain, we mainly focus on the dynamics of the face rather than on rendering. The proposed technique overcomes the problems of asynchrony and incoherence inherent in classic approaches to audiovisual synthesis. The different synthesis steps are similar to typical concatenative speech synthesis but are generalized to the acoustic-visual domain. The bimodal synthesis was evaluated using perceptual and subjective evaluations. The overall outcome of the evaluation indicates that the proposed bimodal acoustic-visual synthesis technique provides intelligible speech in both acoustic and visual channels.

international conference on pattern recognition | 2014

Facial 3D Shape Estimation from Images for Visual Speech Animation

Utpala Musti; Ziheng Zhou; Matti Pietikäinen

In this paper we describe the first version of our system for estimating 3D shape sequences from images of the frontal face. This approach is developed with 3D Visual Speech Animation (VSA) as the target application. In particular, the focus is on the usability of an existing state-of-the-art image-based VSA system and subsequent on-line estimation of the corresponding 3D facial shape sequence from its output. This has the added advantage of a 3D visual speech, which is mainly render ability of the face in different poses and illumination conditions. The idea is based on the detection of landmarks from the facial image which are then used to determine the pose and shape. The method belongs to the category of methods which use a prior 3D Morph able Models (3D-MM) trained using 3D facial data. For the time being it is developed for a person-specific domain, i.e. the 3D-MM and the 2D facial landmark detector are trained using the data of a single person and tested with the same person-specific data.

international conference on robotics and automation | 2016

Geometry based exhaustive line correspondence determination

Bhat K K Srikrishna; Utpala Musti; Janne Heikkilä

In this paper we propose a purely geometric approach to establish correspondence between 3D line segments in a given model and 2D line segments detected in an image. Contrary to the existing methods which use strong assumptions on camera pose, we perform exhaustive search in order to compute maximum number of geometrically permitted correspondences between a 3D model and 2D lines. We present a novel theoretical framework in which we sample the space of camera axis direction (which is bounded and hence can be densely sampled unlike the unbounded space of camera position) and show that geometric constraints arising from it reduce rest of the computation to simple operations of finding camera position as the intersection of 3 planes. These geometric constraints can be represented using indexed arrays which accelerate it further. The algorithm returns all sets of correspondences and associated camera poses having high geometric consensus. The obtained experimental results show that our method has better asymptotic behavior than conventional approach. We also show that with the inclusion of additional sensor information our method can be used to initialize pose in just few seconds in many practical situations.

indian conference on computer vision, graphics and image processing | 2014

3D Visual Speech Animation from Image Sequences

Utpala Musti; Slim Ouni; Ziheng Zhou; Matti Pietikäinen

In this paper we describe an early version of our system which synthesizes 3D visual speech including tongue and teeth from frontal facial image sequences. This system is developed for 3D Visual Speech Animation (VSA) using images generated by an existing state-of-the-art image-based VSA system. In fact, the prime motivation for this system is to have a 3D VSA system from limited amount of training data when compared to that required for developing a conventional corpus based 3D VSA system. It consists of two modules. The first module iteratively estimates the 3D shape of the external facial surface for each image in the input sequence. The second module complements the external face with 3D tongue and teeth to complete the perceptually crucial visual speech information. This has the added advantages of 3D visual speech, which are renderability of the face in different poses and illumination conditions and, enhanced visual information of tongue and teeth. The first module for 3D shape estimation is based on the detection of facial landmarks in images. It uses a prior 3D Morphable Model (3D-MM) trained using 3D facial data. For the time being it is developed for a person-specific domain, i.e., the 3D-MM and the 2D facial landmark detector are trained using the data of a single person and tested with the same person-specific data. The estimated 3D shape sequences are provided as input to the second module along with the phonetic segmentation. For any particular 3D shape, tongue and teeth information is generated by rotating the lower jaw based on few skin points on the jaw and animating a rigid 3D tongue through keyframe interpolation.

International Conference on Auditory-Visual Speech Processing - AVSP2011 | 2011