Ashvin Kannan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ashvin Kannan is active.

Explore More

Publication

Featured researches published by Ashvin Kannan.

human language technology | 1991

Integration of diverse recognition methodologies through reevaluation of N-best sentence hypotheses

Mari Ostendorf; Ashvin Kannan; Steve Austin; Owen Kimball; Richard M. Schwartz; Jan Robin Rohlicek

This paper describes a general formalism for integrating two or more speech recognition technologies, which could be developed at different research sites using different recognition strategies. In this formalism, one system uses the N-best search strategy to generate a list of candidate sentences; the list is rescored by other systems; and the different scores are combined to optimize performance. Specifically, we report on combining the BU system based on stochastic segment models and the BBN system based on hidden Markov models. In addition to facilitating integration of different systems, the N-best approach results in a large reduction in computation for word recognition using the stochastic segment model.

IEEE Transactions on Speech and Audio Processing | 1994

Maximum likelihood clustering of Gaussians for speech recognition

Ashvin Kannan; Mari Ostendorf; Jan Robin Rohlicek

Describes a method for clustering multivariate Gaussian distributions using a maximum likelihood criterion. The authors point out possible applications of model clustering, and then use the approach to determine classes of shared covariances for contest modeling in speech recognition, achieving an order of magnitude reduction in the number of covariance parameters, with no loss in recognition performance. >

IEEE Transactions on Signal Processing | 2000

ML parameter estimation of a multiscale stochastic process using the EM algorithm

Ashvin Kannan; Mari Ostendorf; William Clement Karl; David A. Castanon; Randall K. Fish

An algorithm for estimation of the parameters of a multiscale stochastic process based on scale-recursive dynamics on trees is presented. The expectation-maximization algorithm is used to provide maximum likelihood estimates for the general case of a nonhomogeneous tree with no fixed structure for the process dynamics. Experimental results are presented using synthetic data.

international conference on acoustics, speech, and signal processing | 1993

A comparison of trajectory and mixture modeling in segment-based word recognition

Ashvin Kannan; Mari Ostendorf

A mechanism for implementing mixtures at a phone-subsegment (microsegment) level for continuous word recognition based on the stochastic segment model (SMM) is presented. The issues that are involved in tradeoffs between the trajectory and mixture modeling in segment-based word recognition are investigated. Experimental results are reported on DAPRAs speaker-independent Resource management corpus. The results obtained suggest that there is a tradeoff in using mixture models and trajectory models, associated with the level of detail of the modeling unit. The results support the use of whole segment models in the context-dependent case, and microsegment-level (and possibly segment-level) mixtures rather than frame-level mixtures.<<ETX>>

international conference on acoustics, speech, and signal processing | 1997

Adaptation of polynomial trajectory segment models for large vocabulary speech recognition

Ashvin Kannan; Mari Ostendorf

Segment models are a generalization of HMMs that can represent feature dynamics and/or correlation in time. We develop the theory of Bayesian and maximum-likelihood adaptation for a segment model characterized by a polynomial mean trajectory. We show how adaptation parameters can be shared and adaptation detail can be controlled at run-time based on the amount of adaptation data available. Results on the Switchboard corpus show error reductions for unsupervised transcription mode adaptation and supervised batch mode adaptation.

human language technology | 1992

Weight estimation for N-best rescoring

Ashvin Kannan; Mari Ostendorf; J. Robin Rohlicek

This paper describes recent improvements in the weight estimation technique for sentence hypothesis rescoring using the N-Best formalism. Mismatches between training and test data are also explored.

international conference on acoustics speech and signal processing | 1999

Tree-structured models of parameter dependence for rapid adaptation in large vocabulary conversational speech recognition

Ashvin Kannan; Sanjeev Khudanpur

Two models of statistical dependence between the acoustic model parameters of a large vocabulary conversational speech recognition (LVCSR) system are investigated for the purpose of rapid speaker- and environment-adaptation from a very small amount of speech: (i) a Gaussian multiscale process governed by a stochastic linear dynamical system on a tree, and (ii) a simple hierarchical tree-structured prior. Both methods permit Bayesian (MAP) estimation of acoustic model parameters without parameter-tying even when no samples are available to independently estimate some parameters due to the limited amount of adaptation data. Modeling methodologies are contrasted, and comparative performance of the two on the Switchboard task is presented under identical test conditions for supervised and unsupervised adaptation with controlled amounts of adaptation speech. Both methods provide significant (1% absolute) gain in accuracy over adaptation methods that do not exploit the dependence between acoustic model parameters.

IEEE Transactions on Speech and Audio Processing | 1998

A comparison of constrained trajectory segment models for large vocabulary speech recognition

Ashvin Kannan; Mari Ostendorf

This paper compares parametric and nonparametric constrained-mean trajectory segment models for large vocabulary speech recognition, extending distribution clustering techniques to handle polynomial mean trajectory models for robust parameter estimation. The parametric model has fewer free parameters and gives similar recognition performance to the nonparametric model, but has higher recognition costs.

ieee automatic speech recognition and understanding workshop | 2001

Task-specific adaptation of speech recognition models

A. Sankar; Ashvin Kannan; B. Shahshahani; E. Jackson

Most published adaptation research focuses on speaker adaptation, and on adaptation for noisy channels and background environments. We study acoustic, grammar, and combined acoustic and grammar adaptation for creating task-specific recognition models. Comprehensive experimental results are presented using data from natural language quotes and a trading application. The results show that task adaptation gives substantial improvements in both utterance understanding accuracy, and recognition speed.

Speech Communication | 2004

A comprehensive study of task-specific adaptation of speech recognition models

Ananth Sankar; Ashvin Kannan

Abstract Most published adaptation research focuses on speaker adaptation, and on adaptation for noisy channels and background environments. In this paper, we present a study of task adaptation, where the speech recognition models are adapted to a specific application or task, giving significant performance gains. We explore several new questions about adaptation which have not been studied before, and present novel solutions to these problems. For example, we show that adaptation can result in increased out-of-grammar error rates. We present an automatic confidence score mapping algorithm to correct this problem. We show that grammar-dependent acoustic adaptation gives improved performance. In addition, we show that in-grammar acoustic adaptation gives significantly better results. We study acoustic and grammar task adaptation, and show that the gains are additive. Finally we show that adaptation improves both accuracy and speed, where traditional studies have been more focused on accuracy alone. We also study traditional adaptation modes such as supervised and unsupervised adaptation, the use of confidence thresholds for unsupervised adaptation, and the effect of the amount of data on task adaptation.

Explore More