Anoop Deoras | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Anoop Deoras is active.

Explore More

Publication

Featured researches published by Anoop Deoras.

ieee automatic speech recognition and understanding workshop | 2011

Strategies for training large scale neural network language models

Tomas Mikolov; Anoop Deoras; Daniel Povey; Lukas Burget; Jan Cernocky

We describe how to effectively train neural network based language models on large data sets. Fast convergence during training and better overall performance is observed when the training data are sorted by their relevance. We introduce hash-based implementation of a maximum entropy model, that can be trained as a part of the neural network model. This leads to significant reduction of computational complexity. We achieved around 10% relative reduction of word error rate on English Broadcast News speech recognition task, against large 4-gram model trained on 400M tokens.

IEEE Transactions on Audio, Speech, and Language Processing | 2014

Application of Deep Belief Networks for natural language understanding

Ruhi Sarikaya; Geoffrey E. Hinton; Anoop Deoras

Applications of Deep Belief Nets (DBN) to various problems have been the subject of a number of recent studies ranging from image classification and speech recognition to audio classification. In this study we apply DBNs to a natural language understanding problem. The recent surge of activity in this area was largely spurred by the development of a greedy layer-wise pretraining method that uses an efficient learning algorithm called Contrastive Divergence (CD). CD allows DBNs to learn a multi-layer generative model from unlabeled data and the features discovered by this model are then used to initialize a feed-forward neural network which is fine-tuned with backpropagation. We compare a DBN-initialized neural network to three widely used text classification algorithms: Support Vector Machines (SVM), boosting and Maximum Entropy (MaxEnt). The plain DBN-based model gives a call-routing classification accuracy that is equal to the best of the other models. However, using additional unlabeled data for DBN pre-training and combining DBN-based learned features with the original features provides significant gains over SVMs, which, in turn, performed better than both MaxEnt and Boosting.

international conference on acoustics, speech, and signal processing | 2011

Variational approximation of long-span language models for lvcsr

Anoop Deoras; Tomas Mikolov; Stefan Kombrink; Martin Karafiát; Sanjeev Khudanpur

Long-span language models that capture syntax and semantics are seldom used in the first pass of large vocabulary continuous speech recognition systems due to the prohibitive search-space of sentence-hypotheses. Instead, an N-best list of hypotheses is created using tractable n-gram models, and rescored using the long-span models. It is shown in this paper that computationally tractable variational approximations of the long-span models are a better choice than standard n-gram models for first pass decoding. They not only result in a better first pass output, but also produce a lattice with a lower oracle word error rate, and rescoring the N-best list from such lattices with the long-span models requires a smaller N to attain the same accuracy. Empirical results on the WSJ, MIT Lectures, NIST 2007 Meeting Recognition and NIST 2001 Conversational Telephone Recognition data sets are presented to support these claims.

Speech Communication | 2013

Approximate inference: A sampling based modeling technique to capture complex dependencies in a language model

Anoop Deoras; Tomas Mikolov; Stefan Kombrink; Kenneth Church

In this paper, we present strategies to incorporate long context information directly during the first pass decoding and also for the second pass lattice re-scoring in speech recognition systems. Long-span language models that capture complex syntactic and/or semantic information are seldom used in the first pass of large vocabulary continuous speech recognition systems due to the prohibitive increase in the size of the sentence-hypotheses search space. Typically, n-gram language models are used in the first pass to produce N-best lists, which are then re-scored using long-span models. Such a pipeline produces biased first pass output, resulting in sub-optimal performance during re-scoring. In this paper we show that computationally tractable variational approximations of the long-span and complex language models are a better choice than the standard n-gram model for the first pass decoding and also for lattice re-scoring.

IEEE Transactions on Audio, Speech, and Language Processing | 2013

Joint Discriminative Decoding of Words and Semantic Tags for Spoken Language Understanding

Anoop Deoras; Gokhan Tur; Ruhi Sarikaya; Dilek Hakkani-Tür

Most Spoken Language Understanding (SLU) systems today employ a cascade approach, where the best hypothesis from Automatic Speech Recognizer (ASR) is fed into understanding modules such as slot sequence classifiers and intent detectors. The output of these modules is then further fed into downstream components such as interpreter and/or knowledge broker. These statistical models are usually trained individually to optimize the error rate of their respective output. In such approaches, errors from one module irreversibly propagates into other modules causing a serious degradation in the overall performance of the SLU system. Thus it is desirable to jointly optimize all the statistical models together. As a first step towards this, in this paper, we propose a joint decoding framework in which we predict the optimal word as well as slot sequence (semantic tag sequence) jointly given the input acoustic stream. Furthermore, the improved recognition output is then used for an utterance classification task, specifically, we focus on intent detection task. On a SLU task, we show 1.5% absolute reduction (7.6% relative reduction) in word error rate (WER) and 1.2% absolute improvement in F measure for slot prediction when compared to a very strong cascade baseline comprising of state-of-the-art large vocabulary ASR followed by conditional random field (CRF) based slot sequence tagger. Similarly, for intent detection, we show 1.2% absolute reduction (12% relative reduction) in classification error rate.

international conference on acoustics, speech, and signal processing | 2014

Task specific continuous word representations for mono and multi-lingual spoken language understanding

Tasos Anastasakos; Young-Bum Kim; Anoop Deoras

Models for statistical spoken language understanding (SLU) systems are conventionally trained using supervised discriminative training methods. In many cases, however, labeled data necessary for these supervised techniques is not readily available necessitating a laborious data collection and annotation effort. This often results into data sets that are not expansive enough to cover adequately all patterns of natural language phrases that occur in the target applications. Word embedding features alleviate data and feature sparsity issues by learning mathematical representation of words and word associations in the continuous space. In this work, we present techniques to obtain task and domain specific word embeddings and show their usefulness over those obtained from generic unsupervised data. We also show how we transfer these embeddings from one language to another enabling training of a multilingual spoken language understanding system.

ieee automatic speech recognition and understanding workshop | 2009

Iterative decoding: A novel re-scoring framework for confusion networks

Anoop Deoras; Frederick Jelinek

Recently there has been a lot of interest in confusion network re-scoring using sophisticated and complex knowledge sources. Traditionally, re-scoring has been carried out by the N-best list method or by the lattices or confusion network dynamic programming method. Although the dynamic programming method is optimal, it allows for the incorporation of only Markov knowledge sources. N-best lists, on the other hand, can incorporate sentence level knowledge sources, but with increasing N, the re-scoring becomes computationally very intensive. In this paper, we present an elegant framework for confusion network re-scoring called ‘Iterative Decoding’. In it, integration of multiple and complex knowledge sources is not only easier but it also allows for much faster re-scoring as compared to the N-best list method. Experiments with Language Model re-scoring show that for comparable performance (in terms of word error rate (WER)) of Iterative Decoding and N-best list re-scoring, the search effort required by our method is 22 times less than that of the N-best list method.

spoken language technology workshop | 2010

Model combination for Speech Recognition using Empirical Bayes Risk minimization

Anoop Deoras; Denis Filimonov; Mary P. Harper; Frederick Jelinek

In this paper, we explore the model combination problem for rescoring Automatic Speech Recognition (ASR) hypotheses. We use minimum Empirical Bayes Risk for the optimization criterion and Deterministic Annealing techniques to search through the non-convex parameter space. Our experiments on the DARPA WSJ task using several different language models showed that our approach consistently outperforms the standard methods of model combination that optimize using 1-best hypothesis error.

international conference on acoustics, speech, and signal processing | 2010

Language model adaptation using Random Forests

Anoop Deoras; Frederick Jelinek; Yi Su

In this paper we investigate random forest based language model adaptation. Large amounts of out-of-domain data are used to grow the decision trees while very small amounts of in-domain data are used to prune them back, so that the structure of the trees are suitable for the desired domain while the probabilities in the tree nodes are reliably estimated. Extensive experiments are carried out and results are reported on a particular task of adapting Broadcast News language model to the MIT computer science lecture domain. We show 0.80% and 0.60% absolute WER improvement over language model interpolation and count merging techniques, respectively.

IEEE Transactions on Audio, Speech, and Language Processing | 2016

An empirical investigation of word class-based features for natural language understanding

Asli Celikyilmaz; Ruhi Sarikaya; Minwoo Jeong; Anoop Deoras

There are many studies that show using class-based features improves the performance of natural language processing (NLP) tasks such as syntactic part-of-speech tagging, dependency parsing, sentiment analysis, and slot filling in natural language understanding (NLU), but not much has been reported on the underlying reasons for the performance improvements. In this paper, we investigate the effects of the word class-based features for the exponential family of models specifically focusing on NLU tasks, and demonstrate that the performance improvements could be attributed to the regularization effect of the class-based features on the underlying model. Our hypothesis is based on empirical observation that shrinking the sum of parameter magnitudes in an exponential model tends to improve performance. We show on several semantic tagging tasks that there is a positive correlation between the model size reduction by the addition of the class-based features and the model performance on a held-out dataset. We also demonstrate that class-based features extracted from different data sources using alternate word clustering methods can individually contribute to the performance gain. Since the proposed features are generated in an unsupervised manner without significant computational overhead, the improvements in performance largely come for free and we show that such features provide gains for a wide range of tasks from semantic classification and slot tagging in NLU to named entity recognition (NER).

Explore More