Brian A. Hanson
Panasonic
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Brian A. Hanson.
international conference on acoustics, speech, and signal processing | 1989
Ted H. Applebaum; Brian A. Hanson
Corrective training is a recently proposed method of improving hidden Markov model parameters. Corrective training and related algorithms are applied to the domain of small-vocabulary, speaker-independent recognition. The contribution of each parameter of the algorithm is examined. Results confirm that corrective training can improve on the recognition rate achieved by maximum-likelihood training. However, the algorithm is sensitive to selection of parameters. A heuristic quantity is proposed to monitor the progress of the corrective training algorithm, and this quantity is used to adapt a parameter of corrective training. An alternative training algorithm is discussed and compared to corrective training. It yielded open test recognition rates comparable to those of maximum-likelihood training, but inferior to those of corrective training.<<ETX>>
international conference on acoustics, speech, and signal processing | 1984
Brian A. Hanson; David Y. Wong
Algorithms based on spectral subtraction are developed for improving the intelligibility of speech that has been interfered by a second talkers voice. A number of new properties of spectral subtraction are shown, including the effects of phase on the output speech intelligibility, and the choice of magnitude spectral differences for best results. A harmonic extraction algorithm is also developed. Results of formal testing on the final system show that significant gain in intelligibility for low signal-to-noise ratio conditions is achieved.
Journal of the Acoustical Society of America | 1997
Hector R. Javkin; Elizabeth Keate; Norma Antonanzas-Barroso; Brian A. Hanson
This invention includes a speech training system that allows a student to enter any utterance to be learned and have the articulatory model movements required to produce the utterance displayed on a CRT screen. The system accepts a typed utterance, breaking it down into a set of speech units which could be phonemes or syllables and the onset and offset of the speech units. The set of speech units is sent to a synthesizer, which produces a set of parameters indicating the acoustic characteristics of the utterance. The acoustic parameters are converted into articulatory parameters emphasizing the frequency and nasality required to produce the typed utterance. The speech units and onset and offset of each are used to generate tongue-palate contact patterns required to produce the typed utterance. The articulatory parameters are displayed on the CRT screen. The acoustic parameters are also sent to a formant synthesizer which converts the parameters into speech output. The system measures a students production and then evaluates the students production against the parameters of the typed utterance for its similarity. Feedback on the similarity is displayed on the CRT screen.
international conference on acoustics, speech, and signal processing | 1993
Brian A. Hanson; Ted H. Applebaum
High-pass or band-pass filtering of log subband energies has been shown to improve the robustness of automatic speech recognition to convolutional channel distortions. The authors compare several such filters and apply them in the PLP cepstral domain as well as the log subband domain. They evaluate the robustness of these techniques to Lombard-style test speech with additive noise and their ability to cancel channel effects. They explicitly examine the interactions of such high-pass or band-pass filters with cepstral time derivatives (which are themselves high-pass functions). Conclusions are drawn about factors (e.g., log subband vs cepstral domain, high-pass vs band-pass filter characteristics, and use of time derivatives) which determine the success of these filtering approaches for speaker-independent speech recognition in distorted-channel and noisy-Lombard conditions.<<ETX>>
Archive | 1996
Brian A. Hanson; Ted H. Applebaum; Jean-Claude Junqua
Significant improvements in automatic speech recognition performance have been obtained through front-end feature representations which exploit the time varying properties of speech spectra. Various techniques have been developed to incorporate “spectral dynamics” into the speech representation, including temporal derivative features, spectral mean normalization and, more generally, spectral parameter filtering. This chapter describes the implementation and interrelationships of these techniques and illustrates their use in automatic speech recognition under different types of adverse conditions.
international conference on acoustics, speech, and signal processing | 1983
Brian A. Hanson; David Y. Wong; Biing-Hwang Juang
Development and tests on an algorithm to enhance the intelligibility of speech degraded by an interfering talker is reported. This paper discusses the formulation of the problem, the techniques developed, and the results of a limited-scale intelligibility test. While the test results indicate that no intelligibility improvement is obtained from the processing, several promising new directions for this problem have been identified.
international conference on acoustics speech and signal processing | 1996
Ted H. Applebaum; Philippe Morin; Brian A. Hanson
A training procedure for phoneme similarity reference models is described and two word recognition methods based on phoneme similarities for the English language are evaluated under clean, noisy and channel-distorted speech conditions. Optimization of recognition performance is examined in terms of multi-style training, cepstral normalizations, gender dependent models and length of time over which the phoneme similarities are computed. Phoneme similarities provide a compact speech representation which is relatively insensitive to the variations between speakers.
Speech Communication | 1991
Hector R. Javkin; Brian A. Hanson; Abigail Kaun
Abstract Breathiness is used to form linguistic contrasts in some languages, but also characterizes speakers as individuals and, to an extent, gender. The acoustic consequences of breathy phonation are varied, and separable in synthetic speech: they include the introduction of a frication component into the voice source, a raising of the relative amplitude of the first harmonic and a lowering of the overall spectral tilt. Henton and Bladon (1985) claimed that breathiness diminishes intelligibility. The experiments described in the present paper used synthetic speech to determine the effect of adding a noise source to a modal voice source and to determine the effects of the different acoustic consequences of breathiness on the intelligibility of isolated words. No significant effects were found.
Archive | 1997
Ronald B. Richard; Kazue Hata; Stephen Johnson; Steve Pearson; Judson A. Hofmann; Brian A. Hanson
Archive | 2004
Roland Kuhn; Philippe Morin; Brian A. Hanson