Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Satya Dharanipragada is active.

Publication


Featured researches published by Satya Dharanipragada.


IEEE Transactions on Audio, Speech, and Language Processing | 2007

Robust Feature Extraction for Continuous Speech Recognition Using the MVDR Spectrum Estimation Method

Satya Dharanipragada; Umit H. Yapanel; Bhaskar D. Rao

This paper describes a robust feature extraction technique for continuous speech recognition. Central to the technique is the minimum variance distortionless response (MVDR) method of spectrum estimation. We consider incorporating perceptual information in two ways: 1) after the MVDR power spectrum is computed and 2) directly during the MVDR spectrum estimation. We show that incorporating perceptual information directly into the spectrum estimation improves both robustness and computational efficiency significantly. We analyze the class separability and speaker variability properties of the features using a Fisher linear discriminant measure and show that these features provide better class separability and better suppression of speaker-dependent information than the widely used mel frequency cepstral coefficient (MFCC) features. We evaluate the technique on four different tasks: an in-car speech recognition task, the Aurora-2 matched task, the Wall Street Journal (WSJ) task, and the Switchboard task. The new feature extraction technique gives lower word-error-rates than the MFCC and perceptual linear prediction (PLP) feature extraction techniques in most cases. Statistical significance tests reveal that the improvement is most significant in high noise conditions. The technique thus provides improved robustness to noise without sacrificing performance in clean conditions


international conference on acoustics, speech, and signal processing | 2001

MVDR based feature extraction for robust speech recognition

Satya Dharanipragada; Bhaskar D. Rao

Describes a robust feature extraction method for continuous speech recognition. Central to the method is the minimum variance distortionless response (MVDR) method of spectrum estimation and a feature trajectory smoothing technique for reducing the variance in the feature vectors. The above method, when evaluated on continuous speech recognition tasks in a stationary and moving car, gave an average relative improvement in WER of greater than 30%.


IEEE Transactions on Speech and Audio Processing | 2002

A multistage algorithm for spotting new words in speech

Satya Dharanipragada; Salim Roukos

In this paper, we present a fast, vocabulary independent, algorithm for spotting words in speech. The algorithm consists of a phone-ngram representation (indexing) stage and a coarse-to-detailed search stage for spotting a word/phone sequence in speech. The phone-ngram representation stage provides a phoneme-level representation of the speech that can be searched efficiently. We present a novel method for phoneme-recognition using a vocabulary prefix tree to guide the creation of the phone-ngram index. The coarse search, consisting of phone-ngram matching, identifies regions of speech as putative word hits. The detailed acoustic match is then conducted only at the putative hits identified in the coarse match. This gives us vocabulary independence and the desired accuracy and speed in wordspotting. Current lattice-based phoneme-matching algorithms are similar to the coarse-match step of our algorithm. We show that our combined algorithm gives a factor of two improvement over the coarse match. The algorithm has wide-ranging use in distributed and pervasive speech recognition applications such as audio-indexing, spoken message retrieval and video-browsing.


international conference on acoustics, speech, and signal processing | 2004

Feature space Gaussianization

George Saon; Satya Dharanipragada; Daniel Povey

We propose a non-linear feature space transformation for speaker/environment adaptation which forces the individual dimensions of the acoustic data for every speaker to be Gaussian distributed. The transformation is given by the preimage under the Gaussian cumulative distribution function (CDF) of the empirical CDF on a per dimension basis. We show that, for a given dimension, this transformation achieves minimum divergence between the density function of the transformed adaptation data and the normal density with zero mean and unit variance. Experimental results on both small and large vocabulary tasks show consistent improvements over the application of linear adaptation transforms only.


ieee automatic speech recognition and understanding workshop | 1997

Towards a universal speech recognizer for multiple languages

Paul S. Cohen; Satya Dharanipragada; J. Gros; M. Monkowski; Chalapathy Neti; Salim Roukos; Todd Ward

We describe our initial efforts in building a universal recognizer for multiple languages that permits a user to switch languages seamlessly in a single session without requiring any switch in the speech recognition system. Towards this end we have begun building a universal speech recognizer for English and French languages. We experiment with a universal phonology for both French and English and describe speech recognition results for the ATIS task using a combined phonology. Our best results so far show about 5% relative performance degradation for English relative to a purely English system with about twice the vocabulary size and a 9% relative degradation in French relative to a purely French system.


international conference on acoustics speech and signal processing | 1998

A fast vocabulary independent algorithm for spotting words in speech

Satya Dharanipragada; Salim Roukos

In applications such as audio-indexing, spoken message retrieval and video-browsing, it is necessary to have the ability to detect spoken words that are outside the vocabulary of the speech recognizer used in these systems, in large amounts of speech at speeds many times faster than real-time. We present a fast, vocabulary independent, algorithm for spotting words in speech. The algorithm consists of a preprocessing stage and a coarse-to-detailed search strategy for spotting a word/phone sequence in speech. The preprocessing method provides a phone-level representation of the speech that can be searched efficiently. The coarse search, consisting of phone-ngram matching, identifies regions of speech as putative word hits. The detailed acoustic match is then conducted only at the putative hits identified in the coarse match. This gives us the desired accuracy and speed in word spotting. Overall, the algorithm has a speed of execution that is 2400 times faster than real-time.


IEEE Transactions on Speech and Audio Processing | 2005

Maximizing information content in feature extraction

Mukund Padmanabhan; Satya Dharanipragada

In this paper, we consider the problem of quantifying the amount of information contained in a set of features, to discriminate between various classes. We explore these ideas in the context of a speech recognition system, where an important classification sub-problem is to predict the phonetic class, given an observed acoustic feature vector. The connection between information content and speech recognition system performance is first explored in the context of various feature extraction schemes used in speech recognition applications. Subsequently, the idea of optimizing the information content to improve recognition accuracy is generalized to a linear projection of the underlying features. We show that several prior methods to compute linear transformations (such as linear/heteroscedastic discriminant analysis) can be interpreted in this general framework of maximizing the information content. Subsequently, we extend this reasoning and propose a new objective function to maximize a penalized mutual information (pMI) measure. This objective function is seen to be very well correlated with the word error rate of the final system. Finally experimental results are provided that show that the proposed pMI projection consistently outperforms other methods for a variety of cases, leading to relative improvements in the word error rate of 5%-16% over earlier methods.


international conference on document analysis and recognition | 1999

Retrieval from spoken documents using content and speaker information

Mahesh Viswanathan; Homayoon S. M. Beigi; Satya Dharanipragada; Alain Tritschler

There has been a recent upsurge in the deployment of emerging technologies such as speech and speaker recognition which are reaching maturity. We discuss the details of the components required to build a system for audio indexing and retrieval for spoken documents using content and speaker based information facilitated by speech and speaker recognition. The real power of spoken document analysis is in using both content and speaker information together in retrieval by combining the results. The experiments described here are in the broadcast news domain, but the underlying techniques can easily be extended to other speech-centric applications and transactions.


IEEE Transactions on Audio, Speech, and Language Processing | 2006

Gaussian mixture models with covariances or precisions in shared multiple subspaces

Satya Dharanipragada; Karthik Visweswariah

We introduce a class of Gaussian mixture models (GMMs) in which the covariances or the precisions (inverse covariances) are restricted to lie in subspaces spanned by rank-one symmetric matrices. The rank-one basis are shared between the Gaussians according to a sharing structure. We describe an algorithm for estimating the parameters of the GMM in a maximum likelihood framework given a sharing structure. We employ these models for modeling the observations in the hidden-states of a hidden Markov model based speech recognition system. We show that this class of models provide improvement in accuracy and computational efficiency over well-known covariance modeling techniques such as classical factor analysis, shared factor analysis and maximum likelihood linear transformation based models which are special instances of this class of models. We also investigate different sharing mechanisms. We show that for the same number of parameters, modeling precisions leads to better performance when compared to modeling covariances. Modeling precisions also gives a distinct advantage in computational and memory requirements


international conference on acoustics, speech, and signal processing | 2003

Perceptual MVDR-based cepstral coefficients (PMCCs) for robust speech recognition

Umit H. Yapanel; Satya Dharanipragada

This paper describes a robust feature extraction technique for continuous speech recognition. Central to the technique is the minimum variance distortionless response (MVDR) method of spectrum estimation. We incorporate perceptual information directly in to the spectrum estimation. This provides improved robustness and computational efficiency when compared with the previously proposed MVDR-MFCC technique. On an in-car speech recognition task this method, which we refer to as PMCC, is 15% more accurate in WER and requires approximately a factor of 4 times less computation than the MVDR-MFCC technique. On the same task PMCC yields 20% relative improvement over MFCC and 11% relative improvement over PLP frontends. Similar improvements are observed on the Aurora 2 database.

Collaboration


Dive into the Satya Dharanipragada's collaboration.

Researchain Logo
Decentralizing Knowledge