Gautam Varma Mantena
International Institute of Information Technology, Hyderabad
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Gautam Varma Mantena.
international conference on acoustics, speech, and signal processing | 2012
Florian Metze; Nitendra Rajput; Xavier Anguera; Marelie H. Davel; Guillaume Gravier; Charl Johannes van Heerden; Gautam Varma Mantena; Armando Muscariello; Kishore Prahallad; Igor Szöke; Javier Tejedor
In this paper, we describe the “Spoken Web Search” Task, which was held as part of the 2011 MediaEval benchmark campaign. The purpose of this task was to perform audio search with audio input in four languages, with very few resources being available in each language. The data was taken from “spoken web” material collected over mobile phone connections by IBM India. We present results from several independent systems, developed by five teams and using different approaches, compare them, and provide analysis and directions for future research.
IEEE Transactions on Audio, Speech, and Language Processing | 2014
Gautam Varma Mantena; Sivanand Achanta; Kishore Prahallad
The task of query-by-example spoken term detection (QbE-STD) is to find a spoken query within spoken audio data. Current state-of-the-art techniques assume zero prior knowledge about the language of the audio data, and thus explore dynamic time warping (DTW) based techniques for the QbE-STD task. In this paper, we use a variant of DTW based algorithm referred to as non-segmental DTW (NS-DTW), with a computational upper bound of O (mn) and analyze the performance of QbE-STD with Gaussian posteriorgrams obtained from spectral and temporal features of the speech signal. The results show that frequency domain linear prediction cepstral coefficients, which capture the temporal dynamics of the speech signal, can be used as an alternative to traditional spectral parameters such as linear prediction cepstral coefficients, perceptual linear prediction cepstral coefficients and Mel-frequency cepstral coefficients. We also introduce another variant of NS-DTW called fast NS-DTW (FNS-DTW) which uses reduced feature vectors for search. With a reduction factor of α ∈ ℕ, we show that the computational upper bound for FNS-DTW is O(mn/(α2)) which is faster than NS-DTW.
2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays | 2011
Gautam Varma Mantena; S. Rajendran; B. Rambabu; Suryakanth V. Gangashetty; B. Yegnanarayana; Kishore Prahallad
We demonstrate a speech based conversation system under development for information access by farmers in rural and semi-urban areas of India. The challenges are that the system should take care of the significant variations in the pronunciation and also the highly natural and hence unstructured dialog in the usage of the system. The focus of this study is to develop a conversational system which is adaptable to the users over a period of time, in the sense that fewer interactions with the system to get the required information. Some other novel features of the system include multiple decoding schemes and accountability of the wide variations in dialog, pronunciation and environment. A video demonstrating the Mandi information system is available at http://speech.iiit.ac.in/index.php/demos.html
international conference on acoustics, speech, and signal processing | 2014
Gautam Varma Mantena; Kishore Prahallad
For query-by-example spoken term detection (QbE-STD), generation of phone posteriorgrams requires labelled data which would be difficult for languages with low resources. One solution is to build models from rich resource languages and use them in the low resource scenario. However, phone classes are not language universal and alternate representation such as articulatory classes is explored. In this paper, we use articulatory information and their derivatives such as bottle-neck (BN) features (also referred to as articulatory BN features) for QbE-STD. We obtain Gaussian posteriorgrams of articulatory BN features in tandem with the acoustic parameters such as frequency domain linear prediction cepstral coefficients to perform the search. We compare the search performance of articulatory and phone BN features and show that articulatory BN features are a better representation. We also provide experimental results to show that low amounts (30 mins) of training data could be used to derive articulatory BN features.
Archive | 2011
Gautam Varma Mantena; S. Rajendran; Suryakanth V. Gangashetty; B. Yegnanarayana; Kishore Prahallad
Archive | 2013
Kishore Prahallad; Anandaswarup Vadapalli; Naresh Kumar Elluru; Gautam Varma Mantena; Bhargav Pulugundla; Peri Bhaskararao; Hema A. Murthy; Simon King; Vasilis Karaiskos; Alan W. Black
MediaEval | 2011
Gautam Varma Mantena; Bajibabu Bollepalli; Kishore Prahallad
conference of the international speech communication association | 2014
Basil George; Abhijeet Saxena; Gautam Varma Mantena; Kishore Prahallad; B. Yegnanarayana
MediaEval | 2014
Santosh Kesiraju; Gautam Varma Mantena; Kishore Prahallad
MediaEval | 2013
Gautam Varma Mantena; Kishore Prahallad