Jonathan Hamaker | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jonathan Hamaker is active.

Explore More

Publication

Featured researches published by Jonathan Hamaker.

IEEE Transactions on Signal Processing | 2004

Applications of support vector machines to speech recognition

Aravind Ganapathiraju; Jonathan Hamaker; Joseph Picone

Recent work in machine learning has focused on models, such as the support vector machine (SVM), that automatically control generalization and parameterization as part of the overall optimization process. In this paper, we show that SVMs provide a significant improvement in performance on a static pattern classification task based on the Deterding vowel data. We also describe an application of SVMs to large vocabulary speech recognition and demonstrate an improvement in error rate on a continuous alphadigit task (OGI Alphadigits) and a large vocabulary conversational speech task (Switchboard). Issues related to the development and optimization of an SVM/HMM hybrid system are discussed.

IEEE Transactions on Speech and Audio Processing | 2001

Syllable-based large vocabulary continuous speech recognition

Aravind Ganapathiraju; Jonathan Hamaker; Joseph Picone; Mark Ordowski; George R. Doddington

Most large vocabulary continuous speech recognition (LVCSR) systems in the past decade have used a context-dependent (CD) phone as the fundamental acoustic unit. We present one of the first robust LVCSR systems that uses a syllable-level acoustic unit for LVCSR on telephone-bandwidth speech. This effort is motivated by the inherent limitations in phone-based approaches-namely the lack of an easy and efficient way for modeling long-term temporal dependencies. A syllable unit spans a longer time frame, typically three phones, thereby offering a more parsimonious framework for modeling pronunciation variation in spontaneous speech. We present encouraging results which show that a syllable-based system exceeds the performance of a comparable triphone system both in terms of word error rate (WER) and complexity. The WER of the best syllabic system reported here is 49.1% on a standard Switchboard evaluation, a small improvement over the triphone system. We also report results on a much smaller recognition task, OGI Alphadigits, which was used to validate some of the benefits syllables offer over triphones. The syllable-based system exceeds the performance of the triphone system by nearly 20%, an impressive accomplishment since the alphadigits application consists mostly of phone-level minimal pair distinctions.

international conference on acoustics, speech, and signal processing | 1997

An advanced system to generate pronunciations of proper nouns

Neeraj Deshmukh; Julie Ngan; Jonathan Hamaker; Joseph Picone

Accurate recognition of proper nouns is a critical component of automatic speech recognition (ASR). Since there are no obvious letter-to-sound conversion rules that govern the pronunciation of any large set of proper nouns, this is an open-ended problem that evolves constantly under various sociolinguistic influences. A Boltzmann machine neural network is well-suited for the task of generating the most likely pronunciations of a proper noun. This pronunciation output can be used to build better acoustic models for the noun that result in improved recognition performance. We present an advanced version of this N-best pronunciations system; and a multiple pronunciations dictionary of 18000 surnames and 25000 pronunciations used as a training database. The database and software are available in the public domain.

international conference on acoustics speech and signal processing | 1998

Advances in alphadigit recognition using syllables

Jonathan Hamaker; Aravind Ganapathiraju; Joseph Picone; John J. Godfrey

We present a set of experiments which explore the use of syllables for recognition of continuous alphadigit utterances. In this system, syllables are used as the primary unit of recognition. This work was motivated by our need to verify and isolate phenomena seen when performing syllable-based experiments on the Switchboard corpus. The performance of our base syllable system is better than a crossword triphone system while requiring a small portion of the resources necessary for triphone systems. All experiments were performed on the OGI Alphadigits corpus, which consists of telephone-bandwidth alphadigit strings. The word error rate (WER) of the best syllable system (context-independent syllables) reported here is 11.1% compared to 12.2% for a crossword triphone system.

southeastcon | 1997

Benchmarking of FFT algorithms

Michael Balducci; Aravind Ganapathiraju; Jonathan Hamaker; Joseph Picone; Ajitha Choudary; Anthony Skjellum

A large number of fast Fourier transform (FFT) algorithms have been developed over the years. Among these, the most promising are the radix-2, radix-4, split-radix, fast Hartley transform (FHT), quick Fourier transform (QFT), and the decimation-in-time-frequency (DITF) algorithms. We present a rigorous analysis of these algorithms that includes the number of mathematical operations, computational time, memory requirements, and object code size. The results of this work will serve as a framework for creating an object-oriented, poly-functional FFT implementation which will automatically choose the most efficient algorithm given user-specified constraints.

asilomar conference on signals, systems and computers | 2001

Generalized hierarchical search in the ISIP ASR system

Bohumir Jelinek; Feng Zheng; Naveen Parihar; Jonathan Hamaker; Joseph Picone

It has long been a goal of speech researchers to incorporate higher-level knowledge sources such as discourse, part of speech, and understanding constraints into the speech recognition problem. However, current speech recognition systems are highly tuned to N-gram, triphone-based recognition. Thus, researchers have been unable to exploit this knowledge without extensive modifications to the most complex portion of an ASR system - the decoder. In this paper we describe a publicly available, state-of-the-art decoder that employs a flexible and configurable multilevel search strategy capable of incorporating hierarchical knowledge sources with no changes to source code.

southeastcon | 1999

Fast search algorithms for continuous speech recognition

J. Zhao; Jonathan Hamaker; Neeraj Deshmukh; Aravind Ganapathiraju; Joseph Picone

The most important component of a state-of-the-art speech recognition system is the decoder, or search engine. Given this importance, it is no surprise that many algorithms have been devised which attempt to increase the efficiency of the search process while maintaining the quality of the recognition hypotheses. In this paper, we present a Viterbi decoder which uses a two-pass fast-match search to efficiently prune away unlikely parts of the search space. This system is compared to a state-of-the-art Viterbi decoder with beam pruning in evaluations on the OGI Alphadigits Corpus. Experimentation reveals that the Viterbi decoder after a first pass fast-match produces a more efficient search when compared to a Viterbi decoder with beam pruning. However, there is significant overhead associated with the first pass of the fast-match search.

neural information processing systems | 2000