Ankur Gandhe | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ankur Gandhe is active.

Explore More

Publication

Featured researches published by Ankur Gandhe.

ieee automatic speech recognition and understanding workshop | 2013

Using web text to improve keyword spotting in speech

Ankur Gandhe; Long Qin; Florian Metze; Alexander I. Rudnicky; Ian R. Lane; Matthias Eck

For low resource languages, collecting sufficient training data to build acoustic and language models is time consuming and often expensive. But large amounts of text data, such as online newspapers, web forums or online encyclopedias, usually exist for languages that have a large population of native speakers. This text data can be easily collected from the web and then used to both expand the recognizers vocabulary and improve the language model. One challenge, however, is normalizing and filtering the web data for a specific task. In this paper, we investigate the use of online text resources to improve the performance of speech recognition specifically for the task of keyword spotting. For the five languages provided in the base period of the IARPA BABEL project, we automatically collected text data from the web using only Limited LP resources. We then compared two methods for filtering the web data, one based on perplexity ranking and the other based on out-of-vocabulary (OOV) word detection. By integrating the web text into our systems, we observed significant improvements in keyword spotting accuracy for four out of the five languages. The best approach obtained an improvement in actual term weighted value (ATWV) of 0.0424 compared to a baseline system trained only on LimitedLP resources. On average, ATWV was improved by 0.0243 across five languages.

conference of the international speech communication association | 2016

LatticeRnn: Recurrent Neural Networks Over Lattices.

Faisal Ladhak; Ankur Gandhe; Markus Dreyer; Lambert Mathias; Ariya Rastrow; Björn Hoffmeister

We present a new model called LATTICERNN, which generalizes recurrent neural networks (RNNs) to process weighted lattices as input, instead of sequences. A LATTICERNN can encode the complete structure of a lattice into a dense representation, which makes it suitable to a variety of problems, including rescoring, classifying, parsing, or translating lattices using deep neural networks (DNNs). In this paper, we use LATTICERNNs for a classification task: each lattice represents the output from an automatic speech recognition (ASR) component of a spoken language understanding (SLU) system, and we classify the intent of the spoken utterance based on the lattice embedding computed by a LATTICERNN. We show that making decisions based on the full ASR output lattice, as opposed to 1-best or n-best hypotheses, makes SLU systems more robust to ASR errors. Our experiments yield improvements of 13% over a baseline RNN system trained on transcriptions and 10% over an nbest list rescoring system for intent classification.

international conference on acoustics, speech, and signal processing | 2015

Semi-supervised training in low-resource ASR and KWS

Florian Metze; Ankur Gandhe; Yajie Miao; Zaid A. W. Sheikh; Yun Wang; Di Xu; Hao Zhang; Jungsuk Kim; Ian R. Lane; Wonkyum Lee; Sebastian Stüker; Markus Müller

In particular for “low resource” Keyword Search (KWS) and Speech-to-Text (STT) tasks, more untranscribed test data may be available than training data. Several approaches have been proposed to make this data useful during system development, even when initial systems have Word Error Rates (WER) above 70%. In this paper, we present a set of experiments on low-resource languages in telephony speech quality in Assamese, Bengali, Lao, Haitian, Zulu, and Tamil, demonstrating the impact that such techniques can have, in particular learning robust bottle-neck features on the test data. In the case of Tamil, when significantly more test data than training data is available, we integrated semi-supervised training and speaker adaptation on the test data, and achieved significant additional improvements in STT and KWS.

international conference on acoustics, speech, and signal processing | 2014

Optimization of Neural Network Language Models for keyword search

Ankur Gandhe; Florian Metze; Alex Waibel; Ian R. Lane

Recent works have shown Neural Network based Language Models (NNLMs) to be an effective modeling technique for Automatic Speech Recognition. Prior works have shown that these models obtain lower perplexity and word error rate (WER) compared to both standard n-gram language models (LMs) and more advanced language models including maximum entropy and random forest LMs. While these results are compelling, prior works were limited to evaluating NNLMs on perplexity and word error rate. Our initial results showed that while NNLMs improved speech recognition accuracy, the improvement in keyword search was negligible. In this paper we propose alternate optimizations of NNLMs for the task of keyword search. We evaluate the performance of the proposed methods for keyword search on the Vietnamese dataset provided in phase one of the BABEL1 project and demonstrate that by penalizing low frequency words during NNLM training, keyword search metrics such as actual term weighted value (ATWV) can be improved by up to 9.3% compared to the standard training methods.

international world wide web conferences | 2012

Domain adaptive answer extraction for discussion boards

Ankur Gandhe; Dinesh Raghu; Rose Catherine

Answer extraction from discussion boards is an extensively studied problem. Most of the existing work is focused on supervised methods for extracting answers using similarity features and forum-specific features. Although this works well for the domain or forum data that it has been trained on, it is difficult to use the same models for a domain where the vocabulary is different and some forum specific features may not be available. In this poster, we report initial results of a domain adaptive answer extractor that performs the extraction in two steps: a) an answer recognizer identifies the sentences in a post which are likely to be answers, and b) a domain relevance module determines the domain significance of the identified answer. We use domain independent methodology that can be easily adapted to any given domain with minimum effort.

empirical methods in natural language processing | 2011