Sarangarajan Parthasarathy

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sarangarajan Parthasarathy is active.

Explore More

Publication

Featured researches published by Sarangarajan Parthasarathy.

international conference on acoustics speech and signal processing | 1996

Speaker background models for connected digit password speaker verification

Aaron E. Rosenberg; Sarangarajan Parthasarathy

Likelihood ratio or cohort normalized scoring has been shown to be effective for improving the performance of speaker verification systems. An important problem in this connection is the establishment of principles for constructing speaker background or cohort models which provide the most effective normalized scores. Several kinds of speaker background models are studied. These include individual speaker models, models constructed from the pooled utterances of different numbers of speakers, models selected on the basis of similarity with customer models, models constructed from random selections of speakers, and models constructed from databases recorded under different conditions than the customer models. The results of experiments show that pooled models based on similarity to the reference speaker perform better than individual cohort models from the same similar set of speakers. Pooled background models from a small number of speakers based on similarity perform about the best, but not significantly better than a random selection of 40 or more gender balanced speakers with training conditions matched to the reference speakers.

international conference on acoustics, speech, and signal processing | 2005

The AT&T WATSON speech recognizer

Vincent Goffin; Cyril Allauzen; Enrico Bocchieri; Dilek Hakkani-Tür; Andrej Ljolje; Sarangarajan Parthasarathy; Mazin G. Rahim; Giuseppe Riccardi; Murat Saraclar

This paper describes the AT&T WATSON real-time speech recognizer, the product of several decades of research at AT&T. The recognizer handles a wide range of vocabulary sizes and is based on continuous-density hidden Markov models for acoustic modeling and finite state networks for language modeling. The recognition network is optimized for efficient search. We identify the algorithms used for high-accuracy, real-time and low-latency recognition. We present results for small and large vocabulary tasks taken from the AT&T VoiceTone/sup /spl reg// service, showing word accuracy improvement of about 5% absolute and real-time processing speed-up by a factor between 2 and 3.

international conference on acoustics, speech, and signal processing | 2011

Sentence simplification for spoken language understanding

Gokhan Tur; Dilek Hakkani-Tür; Larry P. Heck; Sarangarajan Parthasarathy

In this paper, we present a sentence simplification method and demonstrate its use to improve intent determination and slot filling tasks in spoken language understanding (SLU) systems. This research is motivated by the observation that, while current statistical SLU models usually perform accurately for simple, well-formed sentences, error rates increase for more complex, longer, more natural or spontaneous utterances. Furthermore, users familiar with web search usually formulate their information requests as a keyword search query, suggesting that frameworks which can handle both forms of inputs is required. We propose a dependency parsing-based sentence simplification approach that extracts a set of keywords from natural language sentences and uses those in addition to entire utterances for completing SLU tasks. We evaluated this approach using the well-studied ATIS corpus with manual and automatic transcriptions and observed significant error reductions for both intent determination (30% relative) and slot filling (15% relative) tasks over the state-of-the-art performances.

international conference on acoustics speech and signal processing | 1998

Speaker identification using minimum classification error training

Olivier Siohan; Aaron E. Rosenberg; Sarangarajan Parthasarathy

We use a minimum classification error (MCE) training paradigm to build a speaker identification system. The training is optimized at the string level for a text-dependent speaker identification task. Experiments performed on a small set speaker identification task show that MCE training can reduce closed-set identification errors by up to 20-25% over a baseline system trained using maximum likelihood estimation. Further experiments suggest that additional improvement can be obtained by using some additional training data from speakers outside the set of registered speakers, leading to an overall reduction of the closed-set identification errors by about 35%.

ieee automatic speech recognition and understanding workshop | 2011

Employing web search query click logs for multi-domain spoken language understanding

Dilek Hakkani-Tür; Gokhan Tur; Larry P. Heck; Asli Celikyilmaz; Ashley Fidler; Dustin Hillard; Rukmini Iyer; Sarangarajan Parthasarathy

Logs of user queries from a search engine (such as Bing or Google) together with the links clicked provide valuable implicit feedback to improve statistical spoken language understanding (SLU) models. In this work, we propose to enrich the existing classification feature set for domain detection with features computed using the click distribution over a set of clicked URLs from search query click logs (QCLs) of user utterances. Since the form of natural language utterances differs stylistically from that of keyword search queries, to be able to match natural language utterances with related search queries, we perform a syntax-based transformation of the original utterances, after filtering out domain-independent salient phrases. This approach results in significant improvements for domain detection, especially when detecting the domains of web-related user utterances.

international conference on acoustics, speech, and signal processing | 2001

On the implementation of ASR algorithms for hand-held wireless mobile devices

Richard C. Rose; Sarangarajan Parthasarathy; Bojana Gajic; Aaron E. Rosenberg; Shrikanth Narayanan

The paper is concerned with the implementation of automatic speech recognition (ASR) based services on wireless mobile devices. Techniques are investigated for improving the performance of ASR systems in the context of the devices themselves, the environments that they are used in, and the networks they are connected to. A set of ASR tasks and ASR system architectures that are applicable to a wide range of simple mobile devices is presented. A prototype ASR based service is defined and the implementation of the service on a wireless mobile device is described. A database of speech utterances was collected from a population of fifty users interacting with this prototype service in multiple environments. An experimental study was performed where model compensation procedures for improving acoustic robustness and lattice rescoring procedures for reducing task perplexity were evaluated on this speech corpus.

international conference on acoustics speech and signal processing | 1999

Detection of target speakers in audio databases

Ivan Magrin-Chagnolleau; Aaron E. Rosenberg; Sarangarajan Parthasarathy

The problem of speaker detection in audio databases is addressed in this paper. Gaussian mixture modeling is used to build target speaker and background models. A detection algorithm based on a likelihood ratio calculation is applied to estimate target speaker segments. Evaluation procedures are defined in detail for this task. Results are given for different subsets of the HUB4 broadcast news database. For one target speaker, with the data restricted to high quality speech segments, the segment miss rate is approximately 7%. For unrestricted data, the segment miss rate is approximately 27%. In both cases the segment false alarm rate is 4 or 5 per hour. For two target speakers with unrestricted data, the segment miss rate is approximately 63% with about 27 segment false alarms per hour. The decrease in performance for two target speakers is largely associated with short speech segments in the two target speaker test data which are undetectable in the current configuration of the detection algorithm.

international conference on acoustics, speech, and signal processing | 1987

The statistical performance of state-variable balancing and Prony's method in parameter estimation

Alex C. Kot; Sarangarajan Parthasarathy; Donald W. Tufts; Richard J. Vaccaro

This paper presents a statistical analysis of the state-variable balancing and Prony methods for estimating the parameters of exponential signals in the presence of additive noise. The case of frequency estimation for a single sinusoid is carried out in detail. Analytical expressions for the variances of the frequency estimates at high signal-to-noise ratios are derived. The calculated variances are compared to the Cramer-Rao bound. The results are validated by simulations over a wide range of signal-to-noise ratios. The analysis and simulations show that the state-variable balancing method can provide slightly more accurate frequency estimates while avoiding the problem of selecting the signal zeros of the Prony polynomial.

IEEE Transactions on Acoustics, Speech, and Signal Processing | 1987

Statistical performance of single sinusoid frequency estimation in white noise using state-variable balancing and linear prediction

Alex C. Kot; Sarangarajan Parthasarathy; Donald W. Tufts; Richard J. Vaccaro

This correspondence presents a statistical analysis of frequency estimation using state-variable balancing for a single sinusoid in the presence of additive noise at high signal-to-noise ratios. The calculated variance is compared to the performance of the frequency estimation using linear prediction and the result is validated by simulations.

international conference on acoustics, speech, and signal processing | 2003

An efficient framework for robust mobile speech recognition services

Richard C. Rose; Iker Arizmendi; Sarangarajan Parthasarathy

A distributed framework for implementing automatic speech recognition (ASR) services on wireless mobile devices is presented. The framework is shown to scale easily to support a large number of mobile users connected over a wireless network and degrade gracefully under peak loads. The importance of using robust acoustic modeling techniques is demonstrated for situations when the use of specialized acoustic transducers on the mobile devices is not practical. It is shown that unsupervised acoustic normalization and adaptation techniques can reduce speech recognition word error rate (WER) by 30 percent. It is also shown that an unsupervised paradigm for updating and applying these robust modeling algorithms can be efficiently implemented within the distributed framework.

Explore More