Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ted H. Applebaum is active.

Publication


Featured researches published by Ted H. Applebaum.


international conference on acoustics, speech, and signal processing | 1990

Robust speaker-independent word recognition using static, dynamic and acceleration features: experiments with Lombard and noisy speech

Brian A. Hanson; Ted H. Applebaum

Speaker-independent recognition of Lombard and noisy speech by a recognizer trained with normal speech is discussed. Speech was represented by static, dynamic (first difference), and acceleration (second difference) features. Strong interaction was found between these temporal features, the frequency differentiation due to cepstral weighting, and the degree of smoothing in the spectral analysis. When combined with the other features, acceleration raised recognition rates for Lombard or noisy input speech. Dynamic and acceleration features were found to perform much better than the static feature for noisy Lombard speech. This suggests that an algorithm which excludes the static feature in high ambient noise is desirable.<<ETX>>


international conference on acoustics, speech, and signal processing | 1989

Enhancing the discrimination of speaker independent hidden Markov models with corrective training

Ted H. Applebaum; Brian A. Hanson

Corrective training is a recently proposed method of improving hidden Markov model parameters. Corrective training and related algorithms are applied to the domain of small-vocabulary, speaker-independent recognition. The contribution of each parameter of the algorithm is examined. Results confirm that corrective training can improve on the recognition rate achieved by maximum-likelihood training. However, the algorithm is sensitive to selection of parameters. A heuristic quantity is proposed to monitor the progress of the corrective training algorithm, and this quantity is used to adapt a parameter of corrective training. An alternative training algorithm is discussed and compared to corrective training. It yielded open test recognition rates comparable to those of maximum-likelihood training, but inferior to those of corrective training.<<ETX>>


Journal of the Acoustical Society of America | 1998

Speech representation by feature-based word prototypes comprising phoneme targets having reliable high similarity

Philippe Morin; Ted H. Applebaum

Digitized speech utterances are converted into phoneme similarity data and regions of high similarity are then extracted and used in forming the word prototype. By alignment across speakers unreliable high phoneme similarity regions are eliminated. Word prototype targets are then constructed comprising the following parameters: the phoneme symbol, the average peak height of the phoneme similarity score, the average peak location and the left and right frame locations. For each target a statistical weight is assigned representing the percentage of occurrences the particular high similarity region occurred across all speakers. The word prototype is feature-based allowing a robust speech representation to be constructed without the need for frame-by-frame analysis.


Journal of the Acoustical Society of America | 2000

Multistage word recognizer based on reliably detected phoneme similarity regions

Ted H. Applebaum; Philippe Morin

The multistage word recognizer uses a word reference representation based on reliably detected peaks of phoneme similarity values. The word reference representation captures the basic features of the words by targets that describe the location and shape of stable peaks of phoneme similarity values. The first stage of the word hypothesizer represents each reference word with statistical information on the number of high similarity regions over a predefined number of time intervals. The second stage represents each word by a prototype that consists of a series of phoneme targets and global statistics, namely the average word duration and average match rate. These represent the degree of fit of the word prototype to its training data. Word recognition scores generated in the two stages are converted to dimensionless normalized values and combined by averaging for use in selecting the most probable word candidates.


international conference on acoustics, speech, and signal processing | 1993

Subband or cepstral domain filtering for recognition of Lombard and channel-distorted speech

Brian A. Hanson; Ted H. Applebaum

High-pass or band-pass filtering of log subband energies has been shown to improve the robustness of automatic speech recognition to convolutional channel distortions. The authors compare several such filters and apply them in the PLP cepstral domain as well as the log subband domain. They evaluate the robustness of these techniques to Lombard-style test speech with additive noise and their ability to cancel channel effects. They explicitly examine the interactions of such high-pass or band-pass filters with cepstral time derivatives (which are themselves high-pass functions). Conclusions are drawn about factors (e.g., log subband vs cepstral domain, high-pass vs band-pass filter characteristics, and use of time derivatives) which determine the success of these filtering approaches for speaker-independent speech recognition in distorted-channel and noisy-Lombard conditions.<<ETX>>


Archive | 1996

Spectral Dynamics for Speech Recognition Under Adverse Conditions

Brian A. Hanson; Ted H. Applebaum; Jean-Claude Junqua

Significant improvements in automatic speech recognition performance have been obtained through front-end feature representations which exploit the time varying properties of speech spectra. Various techniques have been developed to incorporate “spectral dynamics” into the speech representation, including temporal derivative features, spectral mean normalization and, more generally, spectral parameter filtering. This chapter describes the implementation and interrelationships of these techniques and illustrates their use in automatic speech recognition under different types of adverse conditions.


international conference on acoustics, speech, and signal processing | 1991

Regression features for recognition of speech in quiet and in noise

Ted H. Applebaum; Brian A. Hanson

It is proposed that the number of speech analysis frames used in calculating regression features should be controlled separately from the time length over which the features are calculated. Regression features are used to represent the first two time derivatives of the speech cepstrum in a speaker-independent, isolated-word recognition task. The recognition system is trained on normal (noise-free, non-Lombard) speech, but tested on normal, noisy, Lombard, or noisy-Lombard speech. It is shown that for recognition based on the combination of the first two regression features with the static cepstral coefficients, increasing the time length to more than 200 ms, using all of the frames in this time interval, resulted in the highest recognition rates for noisy-Lombard test speech.<<ETX>>


international conference on acoustics speech and signal processing | 1996

A phoneme-similarity based ASR front-end

Ted H. Applebaum; Philippe Morin; Brian A. Hanson

A training procedure for phoneme similarity reference models is described and two word recognition methods based on phoneme similarities for the English language are evaluated under clean, noisy and channel-distorted speech conditions. Optimization of recognition performance is examined in terms of multi-style training, cepstral normalizations, gender dependent models and length of time over which the phoneme similarities are computed. Phoneme similarities provide a compact speech representation which is relatively insensitive to the variations between speakers.


Journal of the Acoustical Society of America | 1990

Features for speaker‐independent recognition of noisy and Lombard speech

Ted H. Applebaum; Brian A. Hanson

Additive noise and noise‐induced changes in vocal effort (Lombard effect) cause significant loss of performance for recognizers trained on “normal” noise‐free speech when the speech is represented by cepstral coefficients or the combination of cepstral coefficients and their first time derivative. The goal of this work is to find a representation of speech that is more robust to the mismatch of test and training noise conditions. In Applebaum and Hanson [EUSIPCO‐90, Barcelona], it was shown, for a digits vocabulary, that adding a second derivative of the cepstral coefficients substantially improved recognition rate for a recognizer utilizing perceptually based LP analysis when the derivatives were calculated over long (> 200 ms) regression windows. It was conjectured that these long windows not work well for a more confusable vocabulary. The current study extends this previous work to a 21‐word vocabulary consisting of confusable subsets of the English alpha‐digits and the words “no” and “go.” Despite the...


international conference on acoustics, speech, and signal processing | 1979

A methodology for studying telephone amplitude distortion effects on narrowband speech processors

John D. Markel; Steven B. Davis; Ted H. Applebaum

A methodology for studying the effects of telephone amplitude (versus frequency) distortion on narrowband speech processing systems is presented. Digital equalization of test signals is performed such that a recorded electrical signal can be played from a tape recorder into an artificial voice which produces 1) an acoustical pressure wave output without external amplitude distortion, and 2) matching pressure characteristics at distances from the human lips.

Collaboration


Dive into the Ted H. Applebaum's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Roland Kuhn

State Street Corporation

View shared research outputs
Top Co-Authors

Avatar

David Kryze

State Street Corporation

View shared research outputs
Researchain Logo
Decentralizing Knowledge