Is this you? Create Your Porfile

Gautam Varma Mantena

International Institute of Information Technology, Hyderabad

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gautam Varma Mantena is active.

Explore More

Publication

Featured researches published by Gautam Varma Mantena.

international conference on acoustics, speech, and signal processing | 2012

The Spoken Web Search Task at MediaEval 2011

Florian Metze; Nitendra Rajput; Xavier Anguera; Marelie H. Davel; Guillaume Gravier; Charl Johannes van Heerden; Gautam Varma Mantena; Armando Muscariello; Kishore Prahallad; Igor Szöke; Javier Tejedor

In this paper, we describe the “Spoken Web Search” Task, which was held as part of the 2011 MediaEval benchmark campaign. The purpose of this task was to perform audio search with audio input in four languages, with very few resources being available in each language. The data was taken from “spoken web” material collected over mobile phone connections by IBM India. We present results from several independent systems, developed by five teams and using different approaches, compare them, and provide analysis and directions for future research.

IEEE Transactions on Audio, Speech, and Language Processing | 2014

Query-by-example spoken term detection using frequency domain linear prediction and non-segmental dynamic time warping

Gautam Varma Mantena; Sivanand Achanta; Kishore Prahallad

The task of query-by-example spoken term detection (QbE-STD) is to find a spoken query within spoken audio data. Current state-of-the-art techniques assume zero prior knowledge about the language of the audio data, and thus explore dynamic time warping (DTW) based techniques for the QbE-STD task. In this paper, we use a variant of DTW based algorithm referred to as non-segmental DTW (NS-DTW), with a computational upper bound of O (mn) and analyze the performance of QbE-STD with Gaussian posteriorgrams obtained from spectral and temporal features of the speech signal. The results show that frequency domain linear prediction cepstral coefficients, which capture the temporal dynamics of the speech signal, can be used as an alternative to traditional spectral parameters such as linear prediction cepstral coefficients, perceptual linear prediction cepstral coefficients and Mel-frequency cepstral coefficients. We also introduce another variant of NS-DTW called fast NS-DTW (FNS-DTW) which uses reduced feature vectors for search. With a reduction factor of α ∈ ℕ, we show that the computational upper bound for FNS-DTW is O(mn/(α2)) which is faster than NS-DTW.

2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays | 2011

A speech-based conversation system for accessing agriculture commodity prices in Indian languages

Gautam Varma Mantena; S. Rajendran; B. Rambabu; Suryakanth V. Gangashetty; B. Yegnanarayana; Kishore Prahallad

We demonstrate a speech based conversation system under development for information access by farmers in rural and semi-urban areas of India. The challenges are that the system should take care of the significant variations in the pronunciation and also the highly natural and hence unstructured dialog in the usage of the system. The focus of this study is to develop a conversational system which is adaptable to the users over a period of time, in the sense that fewer interactions with the system to get the required information. Some other novel features of the system include multiple decoding schemes and accountability of the wide variations in dialog, pronunciation and environment. A video demonstrating the Mandi information system is available at http://speech.iiit.ac.in/index.php/demos.html

international conference on acoustics, speech, and signal processing | 2014

Use of articulatory bottle-neck features for query-by-example spoken term detection in low resource scenarios

Gautam Varma Mantena; Kishore Prahallad

For query-by-example spoken term detection (QbE-STD), generation of phone posteriorgrams requires labelled data which would be difficult for languages with low resources. One solution is to build models from rich resource languages and use them in the low resource scenario. However, phone classes are not language universal and alternate representation such as articulatory classes is explored. In this paper, we use articulatory information and their derivatives such as bottle-neck (BN) features (also referred to as articulatory BN features) for QbE-STD. We obtain Gaussian posteriorgrams of articulatory BN features in tandem with the acoustic parameters such as frequency domain linear prediction cepstral coefficients to perform the search. We compare the search performance of articulatory and phone BN features and show that articulatory BN features are a better representation. We also provide experimental results to show that low amounts (30 mins) of training data could be used to derive articulatory BN features.

Archive | 2011

Development of a Spoken Dialogue System for accessing Agricultural Information in Telugu

Gautam Varma Mantena; S. Rajendran; Suryakanth V. Gangashetty; B. Yegnanarayana; Kishore Prahallad

Archive | 2013

The Blizzard Challenge 2013 - Indian Language Tasks

Kishore Prahallad; Anandaswarup Vadapalli; Naresh Kumar Elluru; Gautam Varma Mantena; Bhargav Pulugundla; Peri Bhaskararao; Hema A. Murthy; Simon King; Vasilis Karaiskos; Alan W. Black

MediaEval | 2011