Ashvin Kannan
Nuance Communications
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ashvin Kannan.
human language technology | 1991
Mari Ostendorf; Ashvin Kannan; Steve Austin; Owen Kimball; Richard M. Schwartz; Jan Robin Rohlicek
This paper describes a general formalism for integrating two or more speech recognition technologies, which could be developed at different research sites using different recognition strategies. In this formalism, one system uses the N-best search strategy to generate a list of candidate sentences; the list is rescored by other systems; and the different scores are combined to optimize performance. Specifically, we report on combining the BU system based on stochastic segment models and the BBN system based on hidden Markov models. In addition to facilitating integration of different systems, the N-best approach results in a large reduction in computation for word recognition using the stochastic segment model.
IEEE Transactions on Speech and Audio Processing | 1994
Ashvin Kannan; Mari Ostendorf; Jan Robin Rohlicek
Describes a method for clustering multivariate Gaussian distributions using a maximum likelihood criterion. The authors point out possible applications of model clustering, and then use the approach to determine classes of shared covariances for contest modeling in speech recognition, achieving an order of magnitude reduction in the number of covariance parameters, with no loss in recognition performance. >
IEEE Transactions on Signal Processing | 2000
Ashvin Kannan; Mari Ostendorf; William Clement Karl; David A. Castanon; Randall K. Fish
An algorithm for estimation of the parameters of a multiscale stochastic process based on scale-recursive dynamics on trees is presented. The expectation-maximization algorithm is used to provide maximum likelihood estimates for the general case of a nonhomogeneous tree with no fixed structure for the process dynamics. Experimental results are presented using synthetic data.
international conference on acoustics, speech, and signal processing | 1993
Ashvin Kannan; Mari Ostendorf
A mechanism for implementing mixtures at a phone-subsegment (microsegment) level for continuous word recognition based on the stochastic segment model (SMM) is presented. The issues that are involved in tradeoffs between the trajectory and mixture modeling in segment-based word recognition are investigated. Experimental results are reported on DAPRAs speaker-independent Resource management corpus. The results obtained suggest that there is a tradeoff in using mixture models and trajectory models, associated with the level of detail of the modeling unit. The results support the use of whole segment models in the context-dependent case, and microsegment-level (and possibly segment-level) mixtures rather than frame-level mixtures.<<ETX>>
international conference on acoustics, speech, and signal processing | 1997
Ashvin Kannan; Mari Ostendorf
Segment models are a generalization of HMMs that can represent feature dynamics and/or correlation in time. We develop the theory of Bayesian and maximum-likelihood adaptation for a segment model characterized by a polynomial mean trajectory. We show how adaptation parameters can be shared and adaptation detail can be controlled at run-time based on the amount of adaptation data available. Results on the Switchboard corpus show error reductions for unsupervised transcription mode adaptation and supervised batch mode adaptation.
human language technology | 1992
Ashvin Kannan; Mari Ostendorf; J. Robin Rohlicek
This paper describes recent improvements in the weight estimation technique for sentence hypothesis rescoring using the N-Best formalism. Mismatches between training and test data are also explored.
international conference on acoustics speech and signal processing | 1999
Ashvin Kannan; Sanjeev Khudanpur
Two models of statistical dependence between the acoustic model parameters of a large vocabulary conversational speech recognition (LVCSR) system are investigated for the purpose of rapid speaker- and environment-adaptation from a very small amount of speech: (i) a Gaussian multiscale process governed by a stochastic linear dynamical system on a tree, and (ii) a simple hierarchical tree-structured prior. Both methods permit Bayesian (MAP) estimation of acoustic model parameters without parameter-tying even when no samples are available to independently estimate some parameters due to the limited amount of adaptation data. Modeling methodologies are contrasted, and comparative performance of the two on the Switchboard task is presented under identical test conditions for supervised and unsupervised adaptation with controlled amounts of adaptation speech. Both methods provide significant (1% absolute) gain in accuracy over adaptation methods that do not exploit the dependence between acoustic model parameters.
IEEE Transactions on Speech and Audio Processing | 1998
Ashvin Kannan; Mari Ostendorf
This paper compares parametric and nonparametric constrained-mean trajectory segment models for large vocabulary speech recognition, extending distribution clustering techniques to handle polynomial mean trajectory models for robust parameter estimation. The parametric model has fewer free parameters and gives similar recognition performance to the nonparametric model, but has higher recognition costs.
ieee automatic speech recognition and understanding workshop | 2001
A. Sankar; Ashvin Kannan; B. Shahshahani; E. Jackson
Most published adaptation research focuses on speaker adaptation, and on adaptation for noisy channels and background environments. We study acoustic, grammar, and combined acoustic and grammar adaptation for creating task-specific recognition models. Comprehensive experimental results are presented using data from natural language quotes and a trading application. The results show that task adaptation gives substantial improvements in both utterance understanding accuracy, and recognition speed.
Speech Communication | 2004
Ananth Sankar; Ashvin Kannan
Abstract Most published adaptation research focuses on speaker adaptation, and on adaptation for noisy channels and background environments. In this paper, we present a study of task adaptation, where the speech recognition models are adapted to a specific application or task, giving significant performance gains. We explore several new questions about adaptation which have not been studied before, and present novel solutions to these problems. For example, we show that adaptation can result in increased out-of-grammar error rates. We present an automatic confidence score mapping algorithm to correct this problem. We show that grammar-dependent acoustic adaptation gives improved performance. In addition, we show that in-grammar acoustic adaptation gives significantly better results. We study acoustic and grammar task adaptation, and show that the gains are additive. Finally we show that adaptation improves both accuracy and speed, where traditional studies have been more focused on accuracy alone. We also study traditional adaptation modes such as supervised and unsupervised adaptation, the use of confidence thresholds for unsupervised adaptation, and the effect of the amount of data on task adaptation.