Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Abhinav Sethy is active.

Publication


Featured researches published by Abhinav Sethy.


Journal of the Acoustical Society of America | 2003

An approach to real‐time magnetic resonance imaging for speech production

Shrikanth Narayanan; Krishna S. Nayak; Sungbok Lee; Abhinav Sethy; Dani Byrd

Magnetic resonance imaging (MRI) has served as a valuable tool for studying static postures in speech production. Now, recent improvements in temporal resolution are making it possible to examine the dynamics of vocal-tract shaping during fluent speech using MRI. The present study uses spiral k-space acquisitions with a low flip-angle gradient echo pulse sequence on a conventional GE Signa 1.5-T CV/i scanner. This strategy allows for acquisition rates of 8-9 images per second and reconstruction rates of 20-24 images per second, making veridical movies of speech production now possible. Segmental durations, positions, and interarticulator timing can all be quantitatively evaluated. Data show clear real-time movements of the lips, tongue, and velum. Sample movies and data analysis strategies are presented.


ieee automatic speech recognition and understanding workshop | 2009

Query-by-example Spoken Term Detection For OOV terms

Carolina Parada; Abhinav Sethy; Bhuvana Ramabhadran

The goal of Spoken Term Detection (STD) technology is to allow open vocabulary search over large collections of speech content. In this paper, we address cases where search term(s) of interest (queries) are acoustic examples. This is provided either by identifying a region of interest in a speech stream or by speaking the query term. Queries often relate to named-entities and foreign words, which typically have poor coverage in the vocabulary of Large Vocabulary Continuous Speech Recognition (LVCSR) systems. Throughout this paper, we focus on query-by-example search for such out-of-vocabulary (OOV) query terms. We build upon a finite state transducer (FST) based search and indexing system [1] to address the query by example search for OOV terms by representing both the query and the index as phonetic lattices from the output of an LVCSR system. We provide results comparing different representations and generation mechanisms for both queries and indexes built with word and combined word and subword units [2]. We also present a two-pass method which uses query-by-example search using the best hit identified in an initial pass to augment the STD search results. The results demonstrate that query-by-example search can yield a significantly better performance, measured using Actual Term-Weighted Value (ATWV), of 0.479 when compared to a baseline ATWV of 0.325 that uses reference pronunciations for OOVs. Further improvements can be obtained with the proposed two pass approach and filtering using the expected unigram counts from the LVCSR systems lexicon.


international conference on acoustics, speech, and signal processing | 2009

Effect of pronounciations on OOV queries in spoken term detection

Dogan Can; Erica Cooper; Abhinav Sethy; Christopher M. White; Bhuvana Ramabhadran; Murat Saraclar

The spoken term detection (STD) task aims to return relevant segments from a spoken archive that contain the query terms whether or not they are in the system vocabulary. This paper focuses on pronunciation modeling for Out-of-Vocabulary (OOV) terms which frequently occur in STD queries. The STD system described in this paper indexes word-level and sub-word level lattices or confusion networks produced by an LVCSR system using Weighted Finite State Transducers (WFST).We investigate the inclusion of n-best pronunciation variants for OOV terms (obtained from letter-to-sound rules) into the search and present the results obtained by indexing confusion networks as well as lattices. The following observations are worth mentioning: phone indexes generated from sub-words represent OOVs well and too many variants for the OOV terms degrade performance if pronunciations are not weighted.


international conference on acoustics, speech, and signal processing | 2013

System combination and score normalization for spoken term detection

Jonathan Mamou; Jia Cui; Xiaodong Cui; Mark J. F. Gales; Brian Kingsbury; Kate Knill; Lidia Mangu; David Nolden; Michael Picheny; Bhuvana Ramabhadran; Ralf Schlüter; Abhinav Sethy; Philip C. Woodland

Spoken content in languages of emerging importance needs to be searchable to provide access to the underlying information. In this paper, we investigate the problem of extending data fusion methodologies from Information Retrieval for Spoken Term Detection on low-resource languages in the framework of the IARPA Babel program. We describe a number of alternative methods improving keyword search performance. We apply these methods to Cantonese, a language that presents some new issues in terms of reduced resources and shorter query lengths. First, we show score normalization methodology that improves in average by 20% keyword search performance. Second, we show that properly combining the outputs of diverse ASR systems performs 14% better than the best normalized ASR system.


international conference on acoustics, speech, and signal processing | 2009

A new method for OOV detection using hybrid word/fragment system

Ariya Rastrow; Abhinav Sethy; Bhuvana Ramabhadran

In this paper, we propose a new method for detecting regions with out-of-vocabulary (OOV) words in the output of a large vocabulary continuous speech recognition (LVCSR) system. The proposed method uses a hybrid system combining words and data-driven variable length sub word units. With the use of a single feature, the posterior probability of sub word units, this method outperforms existing methods published in the literature. We also presents a recipe to discriminatively train a hybrid language model to improve OOV detection rate. Results are presented on the RT04 broadcast news task.


international conference on acoustics, speech, and signal processing | 2013

A high-performance Cantonese keyword search system

Brian Kingsbury; Jia Cui; Xiaodong Cui; Mark J. F. Gales; Kate Knill; Jonathan Mamou; Lidia Mangu; David Nolden; Michael Picheny; Bhuvana Ramabhadran; Ralf Schlüter; Abhinav Sethy; Philip C. Woodland

We present a system for keyword search on Cantonese conversational telephony audio, collected for the IARPA Babel program, that achieves good performance by combining postings lists produced by diverse speech recognition systems from three different research groups. We describe the keyword search task, the data on which the work was done, four different speech recognition systems, and our approach to system combination for keyword search. We show that the combination of four systems outperforms the best single system by 7%, achieving an actual term-weighted value of 0.517.


ieee automatic speech recognition and understanding workshop | 2009

Scaling shrinkage-based language models

Stanley F. Chen; Lidia Mangu; Bhuvana Ramabhadran; Ruhi Sarikaya; Abhinav Sethy

In [1], we show that a novel class-based language model, Model M, and the method of regularized minimum discrimination information (rMDI) models outperform comparable methods on moderate amounts of Wall Street Journal data. Both of these methods are motivated by the observation that shrinking the sum of parameter magnitudes in an exponential language model tends to improve performance [2]. In this paper, we investigate whether these shrinkage-based techniques also perform well on larger training sets and on other domains. First, we explain why good performance on large data sets is uncertain, by showing that gains relative to a baseline n-gram model tend to decrease as training set size increases. Next, we evaluate several methods for data/model combination with Model M and rMDI models on limited-scale domains, to uncover which techniques should work best on large domains. Finally, we apply these methods on a variety of medium-to-large-scale domains covering several languages, and show that Model M consistently provides significant gains over existing language models for state-of-the-art systems in both speech recognition and machine translation.


international conference on acoustics, speech, and signal processing | 2013

Developing speech recognition systems for corpus indexing under the IARPA Babel program

Jia Cui; Xiaodong Cui; Bhuvana Ramabhadran; Janice Kim; Brian Kingsbury; Jonathan Mamou; Lidia Mangu; Michael Picheny; Tara N. Sainath; Abhinav Sethy

Automatic speech recognition is a core component of many applications, including keyword search. In this paper we describe experiments on acoustic modeling, language modeling, and decoding for keyword search on a Cantonese conversational telephony corpus collected as part of the IARPA Babel program. We show that acoustic modeling techniques such as the bootstrapped-and-restructured model and deep neural network acoustic model significantly outperform a state-of-the-art baseline GMM/HMM model, in terms of both recognition performance and keyword search performance, with improvements of up to 11% relative character error rate reduction and 31% relative maximum term weighted value improvement. We show that while an interpolated Model M and neural network LM improve recognition performance, they do not improve keyword search results; however, the advanced LM does reduce the size of the keyword search index. Finally, we show that a simple form of automatically adapted keyword search performs 16% better than a preindexed search system, indicating that out-of-vocabulary search is still a challenge.


ieee automatic speech recognition and understanding workshop | 2007

The IBM 2007 speech transcription system for European parliamentary speeches

Bhuvana Ramabhadran; Olivier Siohan; Abhinav Sethy

TC-STAR is an European Union funded speech to speech translation project to transcribe, translate and synthesize European Parliamentary Plenary Speeches (EPPS). This paper describes IBMs English speech recognition system submitted to the TC-STAR 2007 Evaluation. Language model adaptation based on clustering and data selection using relative entropy minimization provided significant gains in the 2007 evaluation. The additional advances over the 2006 system that we present in this paper include unsupervised training of acoustic and language models; a system architecture that is based on cross-adaptation across complementary systems and system combination through generation of an ensemble of systems using randomized decision tree state-tying. These advances reduced the error rate by 30% relative over the best-performing system in the TC-STAR 2006 evaluation on the 2006 English development and evaluation test sets, and produced one of the best performing systems on the 2007 evaluation in English with a word error rate of 7.1%.


IEEE Transactions on Audio, Speech, and Language Processing | 2014

Converting Neural Network Language Models into Back-off Language Models for Efficient Decoding in Automatic Speech Recognition

Ebru Arisoy; Stanley F. Chen; Bhuvana Ramabhadran; Abhinav Sethy

Neural Network Language Models (NNLMs) have achieved very good performance in large-vocabulary continuous speech recognition (LVCSR) systems. Because decoding with NNLMs is very computationally expensive, there is interest in developing methods to approximate NNLMs with simpler language models that are suitable for fast decoding. In this work, we propose an approximate method for converting a feedforward NNLM into a back-off n-gram language model that can be used directly in existing LVCSR decoders. We convert NNLMs of increasing order to pruned back-off language models, using lower-order models to constrain the n-grams allowed in higher-order models. In experiments on Broadcast News data, we find that the resulting back-off models retain the bulk of the gain achieved by NNLMs over conventional n-gram language models, and give significant accuracy improvements as compared to existing methods for converting NNLMs to back-off models. In addition, the proposed approach can be applied to any type of non-back-off language model to enable efficient decoding.

Collaboration


Dive into the Abhinav Sethy's collaboration.

Top Co-Authors

Avatar

Shrikanth Narayanan

University of Southern California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Panayiotis G. Georgiou

University of Southern California

View shared research outputs
Researchain Logo
Decentralizing Knowledge