Sarangarajan Parthasarathy
Microsoft
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sarangarajan Parthasarathy.
international conference on acoustics speech and signal processing | 1996
Aaron E. Rosenberg; Sarangarajan Parthasarathy
Likelihood ratio or cohort normalized scoring has been shown to be effective for improving the performance of speaker verification systems. An important problem in this connection is the establishment of principles for constructing speaker background or cohort models which provide the most effective normalized scores. Several kinds of speaker background models are studied. These include individual speaker models, models constructed from the pooled utterances of different numbers of speakers, models selected on the basis of similarity with customer models, models constructed from random selections of speakers, and models constructed from databases recorded under different conditions than the customer models. The results of experiments show that pooled models based on similarity to the reference speaker perform better than individual cohort models from the same similar set of speakers. Pooled background models from a small number of speakers based on similarity perform about the best, but not significantly better than a random selection of 40 or more gender balanced speakers with training conditions matched to the reference speakers.
international conference on acoustics, speech, and signal processing | 2005
Vincent Goffin; Cyril Allauzen; Enrico Bocchieri; Dilek Hakkani-Tür; Andrej Ljolje; Sarangarajan Parthasarathy; Mazin G. Rahim; Giuseppe Riccardi; Murat Saraclar
This paper describes the AT&T WATSON real-time speech recognizer, the product of several decades of research at AT&T. The recognizer handles a wide range of vocabulary sizes and is based on continuous-density hidden Markov models for acoustic modeling and finite state networks for language modeling. The recognition network is optimized for efficient search. We identify the algorithms used for high-accuracy, real-time and low-latency recognition. We present results for small and large vocabulary tasks taken from the AT&T VoiceTone/sup /spl reg// service, showing word accuracy improvement of about 5% absolute and real-time processing speed-up by a factor between 2 and 3.
international conference on acoustics, speech, and signal processing | 2011
Gokhan Tur; Dilek Hakkani-Tür; Larry P. Heck; Sarangarajan Parthasarathy
In this paper, we present a sentence simplification method and demonstrate its use to improve intent determination and slot filling tasks in spoken language understanding (SLU) systems. This research is motivated by the observation that, while current statistical SLU models usually perform accurately for simple, well-formed sentences, error rates increase for more complex, longer, more natural or spontaneous utterances. Furthermore, users familiar with web search usually formulate their information requests as a keyword search query, suggesting that frameworks which can handle both forms of inputs is required. We propose a dependency parsing-based sentence simplification approach that extracts a set of keywords from natural language sentences and uses those in addition to entire utterances for completing SLU tasks. We evaluated this approach using the well-studied ATIS corpus with manual and automatic transcriptions and observed significant error reductions for both intent determination (30% relative) and slot filling (15% relative) tasks over the state-of-the-art performances.
international conference on acoustics speech and signal processing | 1998
Olivier Siohan; Aaron E. Rosenberg; Sarangarajan Parthasarathy
We use a minimum classification error (MCE) training paradigm to build a speaker identification system. The training is optimized at the string level for a text-dependent speaker identification task. Experiments performed on a small set speaker identification task show that MCE training can reduce closed-set identification errors by up to 20-25% over a baseline system trained using maximum likelihood estimation. Further experiments suggest that additional improvement can be obtained by using some additional training data from speakers outside the set of registered speakers, leading to an overall reduction of the closed-set identification errors by about 35%.
ieee automatic speech recognition and understanding workshop | 2011
Dilek Hakkani-Tür; Gokhan Tur; Larry P. Heck; Asli Celikyilmaz; Ashley Fidler; Dustin Hillard; Rukmini Iyer; Sarangarajan Parthasarathy
Logs of user queries from a search engine (such as Bing or Google) together with the links clicked provide valuable implicit feedback to improve statistical spoken language understanding (SLU) models. In this work, we propose to enrich the existing classification feature set for domain detection with features computed using the click distribution over a set of clicked URLs from search query click logs (QCLs) of user utterances. Since the form of natural language utterances differs stylistically from that of keyword search queries, to be able to match natural language utterances with related search queries, we perform a syntax-based transformation of the original utterances, after filtering out domain-independent salient phrases. This approach results in significant improvements for domain detection, especially when detecting the domains of web-related user utterances.
international conference on acoustics, speech, and signal processing | 2001
Richard C. Rose; Sarangarajan Parthasarathy; Bojana Gajic; Aaron E. Rosenberg; Shrikanth Narayanan
The paper is concerned with the implementation of automatic speech recognition (ASR) based services on wireless mobile devices. Techniques are investigated for improving the performance of ASR systems in the context of the devices themselves, the environments that they are used in, and the networks they are connected to. A set of ASR tasks and ASR system architectures that are applicable to a wide range of simple mobile devices is presented. A prototype ASR based service is defined and the implementation of the service on a wireless mobile device is described. A database of speech utterances was collected from a population of fifty users interacting with this prototype service in multiple environments. An experimental study was performed where model compensation procedures for improving acoustic robustness and lattice rescoring procedures for reducing task perplexity were evaluated on this speech corpus.
international conference on acoustics speech and signal processing | 1999
Ivan Magrin-Chagnolleau; Aaron E. Rosenberg; Sarangarajan Parthasarathy
The problem of speaker detection in audio databases is addressed in this paper. Gaussian mixture modeling is used to build target speaker and background models. A detection algorithm based on a likelihood ratio calculation is applied to estimate target speaker segments. Evaluation procedures are defined in detail for this task. Results are given for different subsets of the HUB4 broadcast news database. For one target speaker, with the data restricted to high quality speech segments, the segment miss rate is approximately 7%. For unrestricted data, the segment miss rate is approximately 27%. In both cases the segment false alarm rate is 4 or 5 per hour. For two target speakers with unrestricted data, the segment miss rate is approximately 63% with about 27 segment false alarms per hour. The decrease in performance for two target speakers is largely associated with short speech segments in the two target speaker test data which are undetectable in the current configuration of the detection algorithm.
international conference on acoustics, speech, and signal processing | 1987
Alex C. Kot; Sarangarajan Parthasarathy; Donald W. Tufts; Richard J. Vaccaro
This paper presents a statistical analysis of the state-variable balancing and Prony methods for estimating the parameters of exponential signals in the presence of additive noise. The case of frequency estimation for a single sinusoid is carried out in detail. Analytical expressions for the variances of the frequency estimates at high signal-to-noise ratios are derived. The calculated variances are compared to the Cramer-Rao bound. The results are validated by simulations over a wide range of signal-to-noise ratios. The analysis and simulations show that the state-variable balancing method can provide slightly more accurate frequency estimates while avoiding the problem of selecting the signal zeros of the Prony polynomial.
IEEE Transactions on Acoustics, Speech, and Signal Processing | 1987
Alex C. Kot; Sarangarajan Parthasarathy; Donald W. Tufts; Richard J. Vaccaro
This correspondence presents a statistical analysis of frequency estimation using state-variable balancing for a single sinusoid in the presence of additive noise at high signal-to-noise ratios. The calculated variance is compared to the performance of the frequency estimation using linear prediction and the result is validated by simulations.
international conference on acoustics, speech, and signal processing | 2003
Richard C. Rose; Iker Arizmendi; Sarangarajan Parthasarathy
A distributed framework for implementing automatic speech recognition (ASR) services on wireless mobile devices is presented. The framework is shown to scale easily to support a large number of mobile users connected over a wireless network and degrade gracefully under peak loads. The importance of using robust acoustic modeling techniques is demonstrated for situations when the use of specialized acoustic transducers on the mobile devices is not practical. It is shown that unsupervised acoustic normalization and adaptation techniques can reduce speech recognition word error rate (WER) by 30 percent. It is also shown that an unsupervised paradigm for updating and applying these robust modeling algorithms can be efficiently implemented within the distributed framework.