Aravind Ganapathiraju

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Aravind Ganapathiraju is active.

Explore More

Publication

Featured researches published by Aravind Ganapathiraju.

IEEE Transactions on Signal Processing | 2004

Applications of support vector machines to speech recognition

Aravind Ganapathiraju; Jonathan Hamaker; Joseph Picone

Recent work in machine learning has focused on models, such as the support vector machine (SVM), that automatically control generalization and parameterization as part of the overall optimization process. In this paper, we show that SVMs provide a significant improvement in performance on a static pattern classification task based on the Deterding vowel data. We also describe an application of SVMs to large vocabulary speech recognition and demonstrate an improvement in error rate on a continuous alphadigit task (OGI Alphadigits) and a large vocabulary conversational speech task (Switchboard). Issues related to the development and optimization of an SVM/HMM hybrid system are discussed.

IEEE Transactions on Speech and Audio Processing | 2001

Syllable-based large vocabulary continuous speech recognition

Aravind Ganapathiraju; Jonathan Hamaker; Joseph Picone; Mark Ordowski; George R. Doddington

Most large vocabulary continuous speech recognition (LVCSR) systems in the past decade have used a context-dependent (CD) phone as the fundamental acoustic unit. We present one of the first robust LVCSR systems that uses a syllable-level acoustic unit for LVCSR on telephone-bandwidth speech. This effort is motivated by the inherent limitations in phone-based approaches-namely the lack of an easy and efficient way for modeling long-term temporal dependencies. A syllable unit spans a longer time frame, typically three phones, thereby offering a more parsimonious framework for modeling pronunciation variation in spontaneous speech. We present encouraging results which show that a syllable-based system exceeds the performance of a comparable triphone system both in terms of word error rate (WER) and complexity. The WER of the best syllabic system reported here is 49.1% on a standard Switchboard evaluation, a small improvement over the triphone system. We also report results on a much smaller recognition task, OGI Alphadigits, which was used to validate some of the benefits syllables offer over triphones. The syllable-based system exceeds the performance of the triphone system by nearly 20%, an impressive accomplishment since the alphadigits application consists mostly of phone-level minimal pair distinctions.

southeastcon | 1999

Linear discriminant analysis for signal processing problems

S. Balakrishnama; Aravind Ganapathiraju; Joseph Picone

Linear discriminant analysis (LDA) and principal components analysis (PCA) are two common techniques used for classification and dimensionality reduction. These techniques typically use a linear transformation which can either be implemented in a class-dependent or class-independent fashion. PCA is a feature classification technique in which the data in the input space is transformed to a feature space where the features are decorrelated. On the other hand, the optimization criterion for LDA attempts to maximize class separability. We quantify the efficacy of these two algorithms along with two other classification techniques, support vector machines (SVM) and independent components analysis (ICA). The problem of classifying forestry images based on their scenic beauty is considered. On a standard evaluation task consisting of 478 training images and 159 test images, class-dependent LDA produced a 35.22% misclassification rate, which is significantly better than the 43.3% rate obtained using PCA and is on par with the performance of ICA and SVM.

ieee automatic speech recognition and understanding workshop | 1997

Syllable-a promising recognition unit for LVCSR

Aravind Ganapathiraju; Vaibhava Goel; Joseph Picone; Andres Corrada; George R. Doddington; Katrin Kirchhoff; Mark Ordowski; Barbara Wheatley

We present an attempt to model syllable level acoustic information as a viable alternative to the conventional phone level acoustic unit for large vocabulary continuous speech recognition. The motivation for this work were the inherent limitations in the phone based approach, primarily the decompositional nature and lack of larger scale temporal dependencies. We present preliminary but encouraging results on a syllable based recognition system which exceeds the performance of a comparable triphone system both in terms of word error rate (WER) and complexity. The WER of the best syllable system reported here was 49.1% on a standard SWITCHBOARD evaluation.

international conference on acoustics speech and signal processing | 1998

Advances in alphadigit recognition using syllables

Jonathan Hamaker; Aravind Ganapathiraju; Joseph Picone; John J. Godfrey

We present a set of experiments which explore the use of syllables for recognition of continuous alphadigit utterances. In this system, syllables are used as the primary unit of recognition. This work was motivated by our need to verify and isolate phenomena seen when performing syllable-based experiments on the Switchboard corpus. The performance of our base syllable system is better than a crossword triphone system while requiring a small portion of the resources necessary for triphone systems. All experiments were performed on the OGI Alphadigits corpus, which consists of telephone-bandwidth alphadigit strings. The word error rate (WER) of the best syllable system (context-independent syllables) reported here is 11.1% compared to 12.2% for a crossword triphone system.

southeastcon | 1997

Benchmarking of FFT algorithms

Michael Balducci; Aravind Ganapathiraju; Jonathan Hamaker; Joseph Picone; Ajitha Choudary; Anthony Skjellum

A large number of fast Fourier transform (FFT) algorithms have been developed over the years. Among these, the most promising are the radix-2, radix-4, split-radix, fast Hartley transform (FHT), quick Fourier transform (QFT), and the decimation-in-time-frequency (DITF) algorithms. We present a rigorous analysis of these algorithms that includes the number of mathematical operations, computational time, memory requirements, and object code size. The results of this work will serve as a framework for creating an object-oriented, poly-functional FFT implementation which will automatically choose the most efficient algorithm given user-specified constraints.

southeastcon | 1999

Implementation and analysis of speech recognition front-ends

Vishwanath Mantha; Richard Duncan; Yufeng Wu; Jie Zhao; Aravind Ganapathiraju; Joseph Picone

We have developed a comprehensive front-end module integrating several signal modeling algorithms common to state-of-the-art speech recognition systems. The algorithms presented in this work include mel-frequency cepstra, perceptual linear prediction, filter bank amplitudes, and delta features. The framework for the front-end system was carefully designed to ensure simple integration into speech processing software. The modular design of the software along with an intuitive GUI provide a powerful tutorial by allowing a wide selection of algorithms. The software is written in a tutorial fashion, with a direct correlation between algorithmic lines of code and equations in the technical paper.

southeastcon | 1999

Scenic beauty estimation using independent component analysis and support vector machines

X. Zhang; V. Ramani; Z. Long; Y. Zeng; Aravind Ganapathiraju; Joseph Picone

The objective in the scenic beauty estimation (SBE) problem is to develop an automatic classification algorithm that matches human subjective ratings. Algorithms such as principal components analysis (PCA) and decision trees (DT) have been applied to this problem with limited success, motivating our search for a better classifier. Since this is obviously a nonlinear classification problem, we applied two nonlinear techniques: independent component analysis (ICA) and support vector machines (SVMs). We evaluated these algorithms on a standard, publicly available data set using a variety of combinations of features. The optimally configured ICA and SVM systems achieved misclassification rates of 33.4% and 32.2% respectively. This is a significant improvement over the best results previously reported on this task: 36.6% for PCA and 43% for DT. Since ambiguity in the features space is a significant problem in this application, these results validate the effectiveness of nonlinear classification techniques.

southeastcon | 1997

Echo cancellation for evaluating speaker identification technology

Aravind Ganapathiraju; Joseph Picone

Echo cancellers using adaptive filtering techniques have traditionally found application in solving a wide variety of communications systems problems. We present here a novel application of an FIR echo canceller to be used in evaluating speaker identification technology on conversational speech data collected over the public telephone network. Various modifications to the standard LMS echo canceller are required to deal with double-talk a time-varying background channel, and stability of the adaptive filter. Our implementation of the echo canceller has been optimized for two-way telephone conversations and has been tested extensively on the Switchboard corpus.

southeastcon | 1999

Fast search algorithms for continuous speech recognition

J. Zhao; Jonathan Hamaker; Neeraj Deshmukh; Aravind Ganapathiraju; Joseph Picone

The most important component of a state-of-the-art speech recognition system is the decoder, or search engine. Given this importance, it is no surprise that many algorithms have been devised which attempt to increase the efficiency of the search process while maintaining the quality of the recognition hypotheses. In this paper, we present a Viterbi decoder which uses a two-pass fast-match search to efficiently prune away unlikely parts of the search space. This system is compared to a state-of-the-art Viterbi decoder with beam pruning in evaluations on the OGI Alphadigits Corpus. Experimentation reveals that the Viterbi decoder after a first pass fast-match produces a more efficient search when compared to a Viterbi decoder with beam pruning. However, there is significant overhead associated with the first pass of the fast-match search.

Explore More