Is this you? Create Your Porfile

Kishore Prahallad

International Institute of Information Technology, Hyderabad

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kishore Prahallad is active.

Explore More

Publication

Featured researches published by Kishore Prahallad.

IEEE Transactions on Audio, Speech, and Language Processing | 2010

Spectral Mapping Using Artificial Neural Networks for Voice Conversion

Srinivas Desai; Alan W. Black; B. Yegnanarayana; Kishore Prahallad

In this paper, we use artificial neural networks (ANNs) for voice conversion and exploit the mapping abilities of an ANN model to perform mapping of spectral features of a source speaker to that of a target speaker. A comparative study of voice conversion using an ANN model and the state-of-the-art Gaussian mixture model (GMM) is conducted. The results of voice conversion, evaluated using subjective and objective measures, confirm that an ANN-based VC system performs as good as that of a GMM-based VC system, and the quality of the transformed speech is intelligible and possesses the characteristics of a target speaker. In this paper, we also address the issue of dependency of voice conversion techniques on parallel data between the source and the target speakers. While there have been efforts to use nonparallel data and speaker adaptation techniques, it is important to investigate techniques which capture speaker-specific characteristics of a target speaker, and avoid any need for source speakers data either for training or for adaptation. In this paper, we propose a voice conversion approach using an ANN model to capture speaker-specific characteristics of a target speaker and demonstrate that such a voice conversion approach can perform monolingual as well as cross-lingual voice conversion of an arbitrary source speaker.

international conference on acoustics, speech, and signal processing | 2009

Voice conversion using Artificial Neural Networks

Srinivas Desai; E. Veera Raghavendra; B. Yegnanarayana; Alan W. Black; Kishore Prahallad

In this paper, we propose to use Artificial Neural Networks (ANN) for voice conversion. We have exploited the mapping abilities of ANN to perform mapping of spectral features of a source speaker to that of a target speaker. A comparative study of voice conversion using ANN and the state-of-the-art Gaussian Mixture Model (GMM) is conducted. The results of voice conversion evaluated using subjective and objective measures confirm that ANNs perform better transformation than GMMs and the quality of the transformed speech is intelligible and has the characteristics of the target speaker.

international conference on acoustics, speech, and signal processing | 2012

The Spoken Web Search Task at MediaEval 2011

Florian Metze; Nitendra Rajput; Xavier Anguera; Marelie H. Davel; Guillaume Gravier; Charl Johannes van Heerden; Gautam Varma Mantena; Armando Muscariello; Kishore Prahallad; Igor Szöke; Javier Tejedor

In this paper, we describe the “Spoken Web Search” Task, which was held as part of the 2011 MediaEval benchmark campaign. The purpose of this task was to perform audio search with audio input in four languages, with very few resources being available in each language. The data was taken from “spoken web” material collected over mobile phone connections by IBM India. We present results from several independent systems, developed by five teams and using different approaches, compare them, and provide analysis and directions for future research.

international conference on acoustics, speech, and signal processing | 2006

Sub-Phonetic Modeling For Capturing Pronunciation Variations For Conversational Speech Synthesis

Kishore Prahallad; Alan W. Black; Ravishankhar Mosur

In this paper we address the issue of pronunciation modeling for conversational speech synthesis. We experiment with two different HMM topologies (fully connected state model and forward connected state model) for sub-phonetic modeling to capture the deletion and insertion of sub-phonetic states during speech production process. We show that the experimented HMM topologies have higher log likelihood than the traditional 5-state sequential model. We also study the first and second mentions of content words and their influence on the pronunciation variation. Finally we report phone recognition experiments using the modified HMM topologies

IEEE Transactions on Audio, Speech, and Language Processing | 2011

Segmentation of Monologues in Audio Books for Building Synthetic Voices

Kishore Prahallad; Alan W. Black

One of the issues in using audio books for building a synthetic voice is the segmentation of large speech files. The use of the Viterbi algorithm to obtain phone boundaries on large audio files fails primarily because of huge memory requirements. Earlier works have attempted to resolve this problem by using large vocabulary speech recognition system employing restricted dictionary and language model. In this paper, we propose suitable modifications to the Viterbi algorithm and demonstrate its usefulness for segmentation of large speech files in audio books. The utterances obtained from large speech files in audio books are used to build synthetic voices. We show that synthetic voices built from audio books in the public domain have Mel-cepstral distortion scores in the range of 4-7, which is similar to voices built from studio quality recordings such as CMU ARCTIC.

IEEE Transactions on Audio, Speech, and Language Processing | 2014

Query-by-example spoken term detection using frequency domain linear prediction and non-segmental dynamic time warping

Gautam Varma Mantena; Sivanand Achanta; Kishore Prahallad

The task of query-by-example spoken term detection (QbE-STD) is to find a spoken query within spoken audio data. Current state-of-the-art techniques assume zero prior knowledge about the language of the audio data, and thus explore dynamic time warping (DTW) based techniques for the QbE-STD task. In this paper, we use a variant of DTW based algorithm referred to as non-segmental DTW (NS-DTW), with a computational upper bound of O (mn) and analyze the performance of QbE-STD with Gaussian posteriorgrams obtained from spectral and temporal features of the speech signal. The results show that frequency domain linear prediction cepstral coefficients, which capture the temporal dynamics of the speech signal, can be used as an alternative to traditional spectral parameters such as linear prediction cepstral coefficients, perceptual linear prediction cepstral coefficients and Mel-frequency cepstral coefficients. We also introduce another variant of NS-DTW called fast NS-DTW (FNS-DTW) which uses reduced feature vectors for search. With a reduction factor of α ∈ ℕ, we show that the computational upper bound for FNS-DTW is O(mn/(α2)) which is faster than NS-DTW.

international conference on acoustics, speech, and signal processing | 2012

Articulatory features for expressive speech synthesis

Alan W. Black; H. Timothy Bunnell; Ying Dou; Prasanna Kumar Muthukumar; Florian Metze; Daniel J. Perry; Tim Polzehl; Kishore Prahallad; Stefan Steidl; Callie Vaughn

This paper describes some of the results from the project entitled “New Parameterization for Emotional Speech Synthesis” held at the Summer 2011 JHU CLSP workshop. We describe experiments on how to use articulatory features as a meaningful intermediate representation for speech synthesis. This parameterization not only allows us to reproduce natural sounding speech but also allows us to generate stylistically varying speech.

spoken language technology workshop | 2008

Global syllable set for building speech synthesis in Indian languages

E.V. Raghavendra; Srinivas Desai; B. Yegnanarayana; Alan W. Black; Kishore Prahallad

Indian languages are syllabic in nature where many syllables are found common across its languages. This motivates us to build a global syllable set by combining multiple language syllables to build a synthesizer which can borrow units from a different language when the required syllable is not found. Such synthesizer make use of speech database in different languages spoken by different speakers, whose output is likely to pick units from multiple languages and hence the synthesized utterance contains units spoken by multiple speakers which would annoy the user. We intend to use a cross lingual voice conversion framework using Artificial Neural Networks (ANN) to transform such an utterance to a single target speaker.

spoken language technology workshop | 2008

Speech synthesis using approximate matching of syllables

E.V. Raghavendra; B. Yegnanarayana; Kishore Prahallad

In this paper we propose a technique for a syllable based speech synthesis system. While syllable based synthesizers produce better sounding speech than diphone and phone, the coverage of all syllables is a non-trivial issue. We address the issue of coverage of syllables through approximating the syllable when the required syllable is not found. To verify our hypothesis, we conducted perceptual studies on manually modified sentences and found that our assumption is valid. Similar approaches have been used in speech synthesis and it shows that such approximation produces intelligible and better quality speech than diphone units.

2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays | 2011

A speech-based conversation system for accessing agriculture commodity prices in Indian languages

Gautam Varma Mantena; S. Rajendran; B. Rambabu; Suryakanth V. Gangashetty; B. Yegnanarayana; Kishore Prahallad

We demonstrate a speech based conversation system under development for information access by farmers in rural and semi-urban areas of India. The challenges are that the system should take care of the significant variations in the pronunciation and also the highly natural and hence unstructured dialog in the usage of the system. The focus of this study is to develop a conversational system which is adaptable to the users over a period of time, in the sense that fewer interactions with the system to get the required information. Some other novel features of the system include multiple decoding schemes and accountability of the wide variations in dialog, pronunciation and environment. A video demonstrating the Mandi information system is available at http://speech.iiit.ac.in/index.php/demos.html

Explore More