Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ananth Sankar is active.

Publication


Featured researches published by Ananth Sankar.


IEEE Transactions on Speech and Audio Processing | 1996

A maximum-likelihood approach to stochastic matching for robust speech recognition

Ananth Sankar; Chin-Hui Lee

Presents a maximum-likelihood (ML) stochastic matching approach to decrease the acoustic mismatch between a test utterance and a given set of speech models so as to reduce the recognition performance degradation caused by distortions in the test utterance and/or the model set. We assume that the speech signal is modeled by a set of subword hidden Markov models (HMM) /spl Lambda//sub x/. The mismatch between the observed test utterance Y and the models /spl Lambda//sub x/ can be reduced in two ways: 1) by an inverse distortion function F/sub /spl nu//(.) that maps Y into an utterance X that matches better with the models /spl Lambda//sub x/ and 2) by a model transformation function G/sub /spl eta//(.) that maps /spl Lambda//sub x/ to the transformed model /spl Lambda//sub x/ that matches better with the utterance Y. We assume the functional form of the transformations F/sub /spl nu//(.) or G/sub /spl eta//(.) and estimate the parameters /spl nu/ or /spl eta/ in a ML manner using the expectation-maximization (EM) algorithm. The choice of the form of F/sub /spl nu//(.) or G/sub /spl eta//(.) is based on prior knowledge of the nature of the acoustic mismatch. The stochastic matching algorithm operates only on the given test utterance and the given set of speech models, and no additional training data is required for the estimation of the mismatch prior to actual testing. Experimental results are presented to study the properties of the proposed algorithm and to verify the efficacy of the approach in improving the performance of a HMM-based continuous speech recognition system in the presence of mismatch due to different transducers and transmission channels.


international conference on acoustics, speech, and signal processing | 1995

Robust speech recognition based on stochastic matching

Ananth Sankar; Chin-Hui Lee

We present a maximum likelihood (ML) stochastic matching approach to decrease the acoustic mismatch between a test utterance Y and a given set of speech hidden Markov models /spl Lambda//sub X/ so as to reduce the recognition performance degradation caused by possible distortions in the test utterance. This mismatch may be reduced in two ways: (1) by an inverse distortion function F/sub /spl nu//(.) that maps Y into an utterance X which matches better with the models /spl Lambda//sub X/, and (2) by a model transformation function G/sub /spl eta//(.) that maps /spl Lambda//sub X/ to the transformed model /spl Lambda//sub Y/ which matches better with the utterance Y. The functional form of the transformations depends upon our prior knowledge about the mismatch, and the parameters are estimated along with the recognized string in a maximum likelihood manner using the EM algorithm. Experimental results verify the efficacy of the approach in improving the performance of a continuous speech recognition system in the presence of mismatch due to different transducers and transmission channels.


international conference on acoustics speech and signal processing | 1996

Acoustic adaptation using nonlinear transformations of HMM parameters

Victor Abrash; Ananth Sankar; Horacio Franco; Michael Cohen

Speech recognition performance degrades significantly when there is a mismatch between testing and training conditions. Linear transformation-based maximum-likelihood (ML) techniques have been proposed recently to tackle this problem. We extend this approach to use nonlinear transformations. These are implemented by multilayer perceptrons (MLPs) which transform the Gaussian means. We derive a generalized expectation-maximization (GEM) training algorithm to estimate the MLP weights. Some preliminary experimental results on nonnative speaker adaptation are presented.


IEEE Transactions on Speech and Audio Processing | 2004

Mixtures of inverse covariances

Vincent Vanhoucke; Ananth Sankar

We describe a model which approximates full covariances in a Gaussian mixture while reducing significantly both the number of parameters to estimate and the computations required to evaluate the Gaussian likelihoods. In this model, the inverse covariance of each Gaussian in the mixture is expressed as a linear combination of a small set of prototype matrices that are shared across components. In addition, we demonstrate the benefits of a subspace-factored extension of this model when representing independent or near-independent product densities. We present a maximum likelihood estimation algorithm for these models, as well as a practical method for implementing it. We show through experiments performed on a variety of speech recognition tasks that this model significantly outperforms a diagonal covariance model, while using far fewer Gaussian-specific parameters. Experiments also demonstrate that a better speed/accuracy tradeoff can be achieved on a real-time speech recognition system.


Speech Communication | 2002

Improved modeling and efficiency for automatic transcription of Broadcast News

Ananth Sankar; Venkata Ramana Rao Gadde; Andreas Stolcke; Fuliang Weng

Abstract Over the last few years, the DARPA-sponsored Hub-4 continuous speech recognition evaluations have advanced speech recognition technology for automatic transcription of broadcast news. In this paper, we report on our research and progress in this domain, with an emphasis on efficient modeling with significantly fewer parameters for faster and more accurate recognition. In the acoustic modeling area, this was achieved through new parameter tying, Gaussian clustering, and mixture weight thresholding schemes. The effectiveness of acoustic adaptation is greatly increased through unsupervised clustering of test data. In language modeling, we explored the use of non-broadcast-news training data as well as the adaptation to topic and speaking styles. We developed an effective and efficient parameter pruning technique for backoff language models that allowed us to cope with ever increasing amounts of training data and expanded N-gram scopes. Finally, we improved our progressive search architecture with more efficient algorithms for lattice generation, compaction, and incorporation of higher-order language models.


Journal of the Acoustical Society of America | 1992

Visual focus of attention in adaptive language acquisition

Ananth Sankar; Allen L. Gorin

In this research on adaptive language acquisition, connectionist systems have been investigated that learn the mapping from a message to a meaningful machine action through interaction with a complex environment. Thus far, the only input to these systems has been the message. However, in many cases, the action also depends on the state of the world, motivating the study of systems with multisensory input. In this work, a task is considered where the machine receives both message and visual input. In particular, the machine action is to focus its attention on one of many blocks of different colors and shapes, in response to a message such as ‘‘Look at the red square.’’ This is done by minimizing a time‐varying potential function that correlates the message and visual input. The visual input is factored through color and shape sensory primitive nodes in an information‐theoretic connectionist network, allowing generalization between different objects having the same color or shape. The system runs in a conversational mode where the user can provide clarifying messages and error feedback, until the system responds correctly. During the course of performing its task, a vocabulary of 389 words was acquired from approximately 1300 unconstrained natural language inputs, collected from ten users. The average number of inputs for the machine to respond correctly was only 1.4.


conference of the international speech communication association | 1995

A comparative study of speaker adaptation techniques.

Leonardo Neumeyer; Ananth Sankar; Vassilios Digalakis


conference of the international speech communication association | 1995

Connectionist speaker normalization and adaptation.

Victor Abrash; Horacio Franco; Ananth Sankar; Michael Cohen


conference of the international speech communication association | 1998

Efficient lattice representation and generation.

Fuliang Weng; Andreas Stolcke; Ananth Sankar


IEEE Transactions on Speech and Audio Processing | 1994

An experiment in spoken language acquisition

Allen L. Gorin; Stephen E. Levinson; Ananth Sankar

Collaboration


Dive into the Ananth Sankar's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Vassilios Digalakis

Technical University of Crete

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Chin-Hui Lee

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge