Sanjeev Khudanpur | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sanjeev Khudanpur is active.

Explore More

Publication

Featured researches published by Sanjeev Khudanpur.

international conference on acoustics, speech, and signal processing | 2011

Extensions of recurrent neural network language model

Tomas Mikolov; Stefan Kombrink; Lukas Burget; Jan Cernocky; Sanjeev Khudanpur

We present several modifications of the original recurrent neural network language model (RNN LM).While this model has been shown to significantly outperform many competitive language modeling techniques in terms of accuracy, the remaining problem is the computational complexity. In this work, we show approaches that lead to more than 15 times speedup for both training and testing phases. Next, we show importance of using a backpropagation through time algorithm. An empirical comparison with feedforward networks is also provided. In the end, we discuss possibilities how to reduce the amount of parameters in the model. The resulting RNN model can thus be smaller, faster both during training and testing, and more accurate than the basic one.

workshop on statistical machine translation | 2009

Joshua: An Open Source Toolkit for Parsing-Based Machine Translation

Zhifei Li; Chris Callison-Burch; Chris Dyer; Sanjeev Khudanpur; Lane Schwartz; Wren N. G. Thornton; Jonathan Weese; Omar F. Zaidan

We describe Joshua, an open source toolkit for statistical machine translation. Joshua implements all of the algorithms required for synchronous context free grammars (SCFGs): chart-parsing, n-gram language model integration, beam-and cube-pruning, and k-best extraction. The toolkit also implements suffix-array grammar extraction and minimum error rate training. It uses parallel and distributed computing techniques for scalability. We demonstrate that the toolkit achieves state of the art translation performance on the WMT09 French-English translation task.

international conference on acoustics, speech, and signal processing | 2014

Improving deep neural network acoustic models using generalized maxout networks

Xiaohui Zhang; Jan Trmal; Daniel Povey; Sanjeev Khudanpur

Recently, maxout networks have brought significant improvements to various speech recognition and computer vision tasks. In this paper we introduce two new types of generalized maxout units, which we call p-norm and soft-maxout. We investigate their performance in Large Vocabulary Continuous Speech Recognition (LVCSR) tasks in various languages with 10 hours and 60 hours of data, and find that the p-norm generalization of maxout consistently performs well. Because, in our training setup, we sometimes see instability during training when training unbounded-output nonlinearities such as these, we also present a method to control that instability. This is the “normalization layer”, which is a nonlinearity that scales down all dimensions of its input in order to stop the average squared output from exceeding one. The performance of our proposed nonlinearities are compared with maxout, rectified linear units (ReLU), tanh units, and also with a discriminatively trained SGMM/HMM system, and our p-norm units with p equal to 2 are found to perform best.

international conference on acoustics, speech, and signal processing | 2015

Librispeech: An ASR corpus based on public domain audio books

Vassil Panayotov; Guoguo Chen; Daniel Povey; Sanjeev Khudanpur

This paper introduces a new corpus of read English speech, suitable for training and evaluating speech recognition systems. The LibriSpeech corpus is derived from audiobooks that are part of the LibriVox project, and contains 1000 hours of speech sampled at 16 kHz. We have made the corpus freely available for download, along with separately prepared language-model training data and pre-built language models. We show that acoustic models trained on LibriSpeech give lower error rate on the Wall Street Journal (WSJ) test sets than models trained on WSJ itself. We are also releasing Kaldi scripts that make it easy to build these systems.

international conference on acoustics, speech, and signal processing | 2000

Towards language independent acoustic modeling

William Byrne; Peter Beyerlein; Juan M. Huerta; Sanjeev Khudanpur; B. Marthi; John Morgan; Nino Peterek; Joseph Picone; Dimitra Vergyri; T. Wang

We describe procedures and experimental results using speech from diverse source languages to build an ASR system for a single target language. This work is intended to improve ASR in languages for which large amounts of training data are not available. We have developed both knowledge-based and automatic methods to map phonetic units from the source languages to the target language. We employed HMM adaptation techniques and discriminative model combination to combine acoustic models from the individual source languages for recognition of speech in the target language. Experiments are described in which Czech Broadcast News is transcribed using acoustic models trained from small amounts of Czech read speech augmented by English, Spanish, Russian, and Mandarin acoustic models.

Computer Speech & Language | 2000

Pronunciation modeling by sharing Gaussian densities across phonetic models

Murat Saraclar; Harriet J. Nock; Sanjeev Khudanpur

Conversational speech exhibits considerable pronunciation variability, which has been shown to have a detrimental effect on the accuracy of automatic speech recognition. There have been many attempts to model pronunciation variation, including the use of decision trees to generate alternate word pronunciations from phonemic baseforms. Use of pronunciation models during recognition is known to improve accuracy. This paper describes the incorporation of pronunciation models into acoustic model training in addition to recognition. Subtle difficulties in the straightforward use of alternatives to canonical pronunciations are first illustrated: it is shown that simply improving the accuracy of the phonetic transcription used for acoustic model training is of little benefit. Acoustic models trained on the most accurate phonetic transcriptions result in worse recognition than acoustic models trained on canonical baseforms. Analysis of this counterintuitive result leads to a new method of accommodating nonstandard pronunciations: rather than allowing a phoneme in the canonical pronunciation to be realized as one of a few distinct alternate phones, the hidden Markov model (HMM) states of the phoneme?s model are instead allowed to share Gaussian mixture components with the HMM states of the model(s) of the alternate realization(s). Qualitatively, this amounts to making a soft decision about which surface form is realized. Quantitatively, experiments show that this method is particularly well suited for acoustic model training for spontaneous speech: a 1.7 %(absolute) improvement in recognition accuracy on the Switchboard corpus is presented.

IEEE Signal Processing Magazine | 2009

Developments and directions in speech recognition and understanding, Part 1 [DSP Education]

Janet M. Baker; Li Deng; James Glass; Sanjeev Khudanpur; Chin-hui Lee; Nelson Morgan; Douglas D. O'Shaughnessy

To advance research, it is important to identify promising future research directions, especially those that have not been adequately pursued or funded in the past. The working group producing this article was charged to elicit from the human language technology (HLT) community a set of well-considered directions or rich areas for future research that could lead to major paradigm shifts in the field of automatic speech recognition (ASR) and understanding. ASR has been an area of great interest and activity to the signal processing and HLT communities over the past several decades. As a first step, this group reviewed major developments in the field and the circumstances that led to their success and then focused on areas it deemed especially fertile for future research. Part 1 of this article will focus on historically significant developments in the ASR area, including several major research efforts that were guided by different funding agencies, and suggest general areas in which to focus research.

international conference on acoustics, speech, and signal processing | 2014

A pitch extraction algorithm tuned for automatic speech recognition

Pegah Ghahremani; Bagher BabaAli; Daniel Povey; Korbinian Riedhammer; Jan Trmal; Sanjeev Khudanpur

In this paper we present an algorithm that produces pitch and probability-of-voicing estimates for use as features in automatic speech recognition systems. These features give large performance improvements on tonal languages for ASR systems, and even substantial improvements for non-tonal languages. Our method, which we are calling the Kaldi pitch tracker (because we are adding it to the Kaldi ASR toolkit), is a highly modified version of the getf0 (RAPT) algorithm. Unlike the original getf0 we do not make a hard decision whether any given frame is voiced or unvoiced; instead, we assign a pitch even to unvoiced frames while constraining the pitch trajectory to be continuous. Our algorithm also produces a quantity that can be used as a probability of voicing measure; it is based on the normalized autocorrelation measure that our pitch extractor uses. We present results on data from various languages in the BABEL project, and show a large improvement over systems without tonal features and systems where pitch and POV information was obtained from SAcC or getf0.

Computer Speech & Language | 2000

Maximum entropy techniques for exploiting syntactic, semantic and collocational dependencies in language modeling

Sanjeev Khudanpur; Jun Wu

A new statistical language model is presented which combines collocational dependencies with two important sources of long-range statistical dependence: the syntactic structure and the topic of a sentence. These dependencies or constraints are integrated using the maximum entropy technique. Substantial improvements are demonstrated over a trigram model in both perplexity and speech recognition accuracy on the Switchboard task. A detailed analysis of the performance of this language model is provided in order to characterize the manner in which it performs better than a standard N -gram model. It is shown that topic dependencies are most useful in predicting words which are semantically related by the subject matter of the conversation. Syntactic dependencies on the other hand are found to be most helpful in positions where the best predictors of the following word are not within N -gram range due to an intervening phrase or clause. It is also shown that these two methods individually enhance an N -gram model in complementary ways and the overall improvement from their combination is nearly additive.

international acm sigir conference on research and development in information retrieval | 2005

Hidden Markov models for automatic annotation and content-based retrieval of images and video

Arnab Ghoshal; Pavel Ircing; Sanjeev Khudanpur

This paper introduces a novel method for automatic annotation of images with keywords from a generic vocabulary of concepts or objects for the purpose of content-based image retrieval. An image, represented as sequence of feature-vectors characterizing low-level visual features such as color, texture or oriented-edges, is modeled as having been stochastically generated by a hidden Markov model, whose states represent concepts. The parameters of the model are estimated from a set of manually annotated (training) images. Each image in a large test collection is then automatically annotated with the a posteriori probability of concepts present in it. This annotation supports content-based search of the image-collection via keywords. Various aspects of model parameterization, parameter estimation, and image annotation are discussed. Empirical retrieval results are presented on two image-collections | COREL and key-frames from TRECVID. Comparisons are made with two other recently developed techniques on the same datasets.

Explore More