Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Sankaranarayanan Ananthakrishnan is active.

Publication


Featured researches published by Sankaranarayanan Ananthakrishnan.


IEEE Transactions on Audio, Speech, and Language Processing | 2008

Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence

Sankaranarayanan Ananthakrishnan; Shrikanth Narayanan

With the advent of prosody annotation standards such as tones and break indices (ToBI), speech technologists and linguists alike have been interested in automatically detecting prosodic events in speech. This is because the prosodic tier provides an additional layer of information over the short-term segment-level features and lexical representation of an utterance. As the prosody of an utterance is closely tied to its syntactic and semantic content in addition to its lexical content, knowledge of the prosodic events within and across utterances can assist spoken language applications such as automatic speech recognition and translation. On the other hand, corpora annotated with prosodic events are useful for building natural-sounding speech synthesizers. In this paper, we build an automatic detector and classifier for prosodic events in American English, based on their acoustic, lexical, and syntactic correlates. Following previous work in this area, we focus on accent (prominence, or ldquostressrdquo) and prosodic phrase boundary detection at the syllable level. Our experiments achieved a performance rate of 86.75% agreement on the accent detection task, and 91.61% agreement on the phrase boundary detection task on the Boston University Radio News Corpus.


international conference on acoustics, speech, and signal processing | 2005

An automatic prosody recognizer using a coupled multi-stream acoustic model and a syntactic-prosodic language model

Sankaranarayanan Ananthakrishnan; Shrikanth Narayanan

Automatic detection and labeling of prosodic events in speech has received much attention from speech technologists and linguists ever since the introduction of annotation standards such as ToBI. Since prosody is intricately bound to the semantics of the utterance, recognition of prosodic events is important for spoken language applications such as automatic understanding and translation of speech. Moreover, corpora labeled with prosodic markers are essential for building speech synthesizers that use data-driven approaches to generate natural speech. In this paper, we build a prosody recognition system that detects stress and prosodic boundaries at the word and syllable level in American English using a coupled hidden Markov model (CHMM) to model multiple, asynchronous acoustic feature streams and a syntactic-prosodic model that captures the relationship between the syntax of the utterance and its prosodic structure. Experiments show that the recognizer achieves about 75% agreement on stress labeling and 88% agreement on boundary labeling at the syllable level.


international conference on acoustics, speech, and signal processing | 2007

Improved Speech Recognition using Acoustic and Lexical Correlates of Pitch Accent in a N-Best Rescoring Framework

Sankaranarayanan Ananthakrishnan; Shrikanth Narayanan

Most statistical speech recognition systems make use of segment-level features, derived mainly from spectral envelope characteristics of the signal, but ignore supra-segmental cues that carry additional information likely to be useful for speech recognition. These cues, which constitute the prosody of the utterance and occur at the syllable, word and utterance level, are closely related to the lexical and syntactic organization of the utterance. In this paper, we explore the use of acoustic and lexical correlates of a subset of these cues in order to improve recognition performance on a read-speech corpus, using word error rate (WER) as the metric. Using the features and methods described in this paper, we were able to obtain a relative WER improvement of 1.3% over a baseline ASR system on the Boston University Radio News Corpus.


ieee automatic speech recognition and understanding workshop | 2003

Transonics: a speech to speech system for English-Persian interactions

Shrikanth Narayanan; Sankaranarayanan Ananthakrishnan; Robert Belvin; E. Ettaile; Shadi Ganjavi; Panayiotis G. Georgiou; C. M. Hein; S. Kadambe; Kevin Knight; Daniel Marcu; Howard Neely; Naveen Srinivasamurthy; David R. Traum; Dagen Wang

In this paper, we describe the first phase of development of our speech-to-speech system between English and Modem Persian under the DARPA Babylon program. We give an overview of the various system components: the front end ASR, the machine translation system and the speech generation system. Challenges such as the sparseness of available spoken language data and solutions that have been employed to maximize the obtained benefits from using these limited resources are examined. Efforts in the creation of the user interface and the underlying dialog management system for mediated communication are described.


international conference on acoustics, speech, and signal processing | 2008

Fine-grained pitch accent and boundary tone labeling with parametric F0 features

Sankaranarayanan Ananthakrishnan; Shrikanth Narayanan

Motivated by linguistic theories of prosodic categoricity, symbolic representations of prosody have recently attracted the attention of speech technologists. Categorical representations such as ToBI not only bear linguistic relevance, but also have the advantage that they can be easily modeled and integrated within applications. Since manual labeling of these categories is time-consuming and expensive, there has been significant interest in automatic prosody labeling. This paper presents a fine-grained ToBI-style prosody labeling system that makes use of features derived from RFC and TILT parameterization of FO together with a n-gram prosodic language model for 4-way pitch accent labeling and 2-way boundary tone labeling. For this task, our system achieves pitch accent labeling accuracy of 56.4% and boundary tone labeling accuracy of 67.7% on the Boston University Radio News Corpus.


IEEE Transactions on Audio, Speech, and Language Processing | 2009

Unsupervised Adaptation of Categorical Prosody Models for Prosody Labeling and Speech Recognition

Sankaranarayanan Ananthakrishnan; Shrikanth Narayanan

Automatic speech recognition (ASR) systems rely almost exclusively on short-term segment-level features (MFCCs), while ignoring higher level suprasegmental cues that are characteristic of human speech. However, recent experiments have shown that categorical representations of prosody, such as those based on the Tones and Break Indices (ToBI) annotation standard, can be used to enhance speech recognizers. However, categorical prosody models are severely limited in scope and coverage due to the lack of large corpora annotated with the relevant prosodic symbols (such as pitch accent, word prominence, and boundary tone labels). In this paper, we first present an architecture for augmenting a standard ASR with symbolic prosody. We then discuss two novel, unsupervised adaptation techniques for improving, respectively, the quality of the linguistic and acoustic components of our categorical prosody models. Finally, we implement the augmented ASR by enriching ASR lattices with the adapted categorical prosody models. Our experiments show that the proposed unsupervised adaptation techniques significantly improve the quality of the prosody models; the adapted prosodic language and acoustic models reduce binary pitch accent (presence versus absence) classification error rate by 13.8% and 4.3%, respectively (relative to the seed models) on the Boston University Radio News Corpus, while the prosody-enriched ASR exhibits a 3.1% relative reduction in word error rate (WER) over the baseline system.


international conference on acoustics, speech, and signal processing | 2006

Speech Recognition Engineering Issues in Speech to Speech Translation System Design for Low Resource Languages and Domains

Shrikanth Narayanan; Panayiotis G. Georgiou; Abhinav Sethy; Dagen Wang; Murtaza Bulut; Shiva Sundaram; Emil Ettelaie; Sankaranarayanan Ananthakrishnan; Horacio Franco; Kristin Precoda; Dimitra Vergyri; Jing Zheng; Wen Wang; Ramana Rao Gadde; Martin Graciarena; Victor Abrash; Michael W. Frandsen; Colleen Richey

Engineering automatic speech recognition (ASR) for speech to speech (S2S) translation systems, especially targeting languages and domains that do not have readily available spoken language resources, is immensely challenging due to a number of reasons. In addition to contending with the conventional data-hungry speech acoustic and language modeling needs, these designs have to accommodate varying requirements imposed by the domain needs and characteristics, target device and usage modality (such as phrase-based, or spontaneous free form interactions, with or without visual feedback) and huge spoken language variability arising due to socio-linguistic and cultural differences of the users. This paper, using case studies of creating speech translation systems between English and languages such as Pashto and Farsi, describes some of the practical issues and the solutions that were developed for multilingual ASR development. These include novel acoustic and language modeling strategies such as language adaptive recognition, active-learning based language modeling, class-based language models that can better exploit resource poor language data, efficient search strategies, including N-best and confidence generation to aid multiple hypotheses translation, use of dialog information and clever interface choices to facilitate ASR, and audio interface design for meeting both usability and robustness requirements


international conference on acoustics, speech, and signal processing | 2008

Automatic classification of question turns in spontaneous speech using lexical and prosodic evidence

Sankaranarayanan Ananthakrishnan; Shrikanth Narayanan

The ability to identify speech acts reliably is desirable in any spoken language system that interacts with humans. Minimally, such a system should be capable of distinguishing between question-bearing turns and other types of utterances. However, this is a non-trivial task, since spontaneous speech tends to have incomplete syntactic, and even ungrammatical, structure and is characterized by disfluencies, repairs and other non-linguistic vocalizations that make simple rule based pattern learning difficult. In this paper, we present a system for identifying question-bearing turns in spontaneous multi-party speech (ICSI Meeting Corpus) using lexical and prosodic evidence. On a balanced test set, our system achieves an accuracy of 71.9% for the binary question vs. non-question classification task. Further, we investigate the robustness of our proposed technique to uncertainty in the lexical feature stream (e.g. caused by speech recognition errors). Our experiments indicate that classification accuracy of the proposed method is robust to errors in the text stream, dropping only about 0.8% for every 10% increase in word error rate (WER).


international conference on acoustics, speech, and signal processing | 2008

A novel algorithm for unsupervised prosodic language model adaptation

Sankaranarayanan Ananthakrishnan; Shrikanth Narayanan

Symbolic representations of prosodic events have been shown to be useful for spoken language applications such as speech recognition. However, a major drawback with categorical prosody models is their lack of scalability due to the difficulty in annotating large corpora with prosodic tags for training. In this paper, we present a novel, unsupervised adaptation technique for bootstrapping categorical prosodic language models (PLMs) from a small, annotated training set. Our experiments indicate that the adaptation algorithm significantly improves the quality and coverage of the PLM. On a test set derived from the Boston University Radio News corpus, the adapted PLM gave a relative improvement of 13.8% over the seed PLM on the binary pitch accent detection task, while reducing the OOV rate by 16.5% absolute.


conference of the international speech communication association | 2006

Combining acoustic, lexical, and syntactic evidence for automatic unsupervised prosody labeling.

Sankaranarayanan Ananthakrishnan; Shrikanth Narayanan

Collaboration


Dive into the Sankaranarayanan Ananthakrishnan's collaboration.

Top Co-Authors

Avatar

Shrikanth Narayanan

University of Southern California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Dagen Wang

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Panayiotis G. Georgiou

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Prem Natarajan

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Aravind Namandi Vembu

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Daniel Marcu

University of Southern California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kevin Knight

University of Southern California

View shared research outputs
Researchain Logo
Decentralizing Knowledge