Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Robert A. Houde is active.

Publication


Featured researches published by Robert A. Houde.


Journal of the Acoustical Society of America | 2000

Some effects of duration on vowel recognition

James Hillenbrand; Michael J. Clark; Robert A. Houde

This study was designed to examine the role of duration in vowel perception by testing listeners on the identification of CVC syllables generated at different durations. Test signals consisted of synthesized versions of 300 utterances selected from a large, multitalker database of /hVd/ syllables [Hillenbrand et al., J. Acoust. Soc. Am. 97, 3099-3111 (1995)]. Four versions of each utterance were synthesized: (1) an original duration set (vowel duration matched to the original utterance), (2) a neutral duration set (duration fixed at 272 ms, the grand mean across all vowels), (3) a short duration set (duration fixed at 144 ms, two standard deviations below the mean), and (4) a long duration set (duration fixed at 400 ms, two standard deviations above the mean). Experiment 1 used a formant synthesizer, while a second experiment was an exact replication using a sinusoidal synthesis method that represented the original vowel spectrum more precisely than the formant synthesizer. Findings included (1) duration had a small overall effect on vowel identity since the great majority of signals were identified correctly at their original durations and at all three altered durations; (2) despite the relatively small average effect of duration, some vowels, especially [see text] and [see text], were significantly affected by duration; (3) some vowel contrasts that differ systematically in duration, such as [see text], and [see text], were minimally affected by duration; (4) a simple pattern recognition model appears to be capable of accounting for several features of the listening test results, especially the greater influence of duration on some vowels than others; and (5) because a formant synthesizer does an imperfect job of representing the fine details of the original vowel spectrum, results using the formant-synthesized signals led to a slight overestimate of the role of duration in vowel recognition, especially for the shortened vowels.


Journal of the Acoustical Society of America | 2002

Speech perception based on spectral peaks versus spectral shape

James Hillenbrand; Robert A. Houde; Robert T. Gayvert

This study was designed to measure the relative contributions to speech intelligibility of spectral envelope peaks (including, but not limited to formants) versus the detailed shape of the spectral envelope. The problem was addressed by asking listeners to identify sentences and nonsense syllables that were generated by two structurally identical source-filter synthesizers, one of which constructs the filter function based on the detailed spectral envelope shape while the other constructs the filter function using a purposely coarse estimate that is based entirely on the distribution of peaks in the envelope. Viewed in the broadest terms the results showed that nearly as much speech information is conveyed by the peaks-only method as by the detail-preserving method. Just as clearly, however, every test showed some measurable advantage for spectral detail, although the differences were not large in absolute terms.


Annals of the New York Academy of Sciences | 1983

A FIBER SUM MODULATION CODE FOR A COCHLEAR PROSTHESIS

Charles W. Parkins; Robert A. Houde; John Bancroft

One approach to developing a stimulation code for a cochlear prosthesis is to attempt to mimic the normal physiologic neural responses to acoustic stimulation. FIGURE 1 demonstrates a simplistic view of the sensory deafness problem. Normally, the basilar membrane with its traveling wave acts as a series of mechanical bandpass filters. The movement of each frequencyspecific location is detected by the transducers of the system, representing the hair cells of the organ of Corti. This analog information is then converted to a digital pulse code which is sent out on a number of transmission lines representing single neurons in the auditory nerve. In the pathologic situation in which the transducers are damaged (sensory deafness), the cochlear prosthesis attempts to bypass the defective transducers and directly stimulate the neural transmission lines. Since the information transmitted over the transmission lines is normally in a digital code, it makes little sense to try to stimulate them in an analog manner. We must therefore devise a signal processor that will convert the original analog acoustic signal into a digital pulse code which will maximize the ability of the central auditory nervous system to process this input in a normal manner, resulting in pitch, loudness and, most importantly, speech discrimination. The characteristics of single auditory neuron responses to acoustic stimulation have been well documented earlier in this volume. FIGURE 2 demonstrates some of these characteristics in a 538-Hz neuron’s response to a toneburst at its characteristic frequency (CF). The top trace is the analog signal representing the acoustic stimulus. The next 16 traces represent the single auditory neuron’s response to each of sixteen consecutive repetitions of the stimulus. In this digital response, the only information delivered is whether or not a response has occurred at a specific point in time. This figure also illustrates the probabilistic nature of the response. The bottom trace represents the summed responses to 200 stimulus presentations, the poststimulus time (PST) histogram. When the response is compared with the stimulus, a time lag is found in the response due largely to the traveling wave of the basilar membrane. There are also more responses at the onset of the stimulus than at the end, even though the stimulus is of constant amplitude. This is the welldocumented adaptation effect.l.


international conference on acoustics, speech, and signal processing | 1976

A real time spectrograph with implications for speech training for the deaf

L. C. Stewart; Wilbur D. Larkin; Robert A. Houde

A real-time speech spectrograph has been developed which is practical for clinical use. It produces and stores a frequency-time-intensity display on a video monitor while the sound is being spoken. The display closely resembles a conventional, broad-band spectrogram in time, frequency and grey scale resolution. Preliminary evaluations have been made to show its usefulness 1. as an aid to the therapist in diagnosis and communication of concepts, and 2. for student drill relatively independent from the therapist. These results suggest that the instrument has considerable potential for training speech production with deaf.


Journal of the Acoustical Society of America | 1995

Vowel recognition: Formants, spectral peaks, and spectral shape

James Hillenbrand; Robert A. Houde

The purpose of this presentation will be to review some recent research on the acoustic characteristics of American English vowels, and to discuss some issues related to the auditory mechanisms underlying vowel recognition. Evidence from studies using traditional formant frequency representations will be reviewed to address issues such as talker normalization and the role of dynamic features in vowel identification. A long‐standing debate in phonetic perception theory concerning whether phonetic quality is controlled by formant frequencies or the overall shape of the spectrum will also be addressed. While formant theory has tended to dominate much of vowel perception research, some compelling arguments have been leveled against formant representations; however, there are also some very important problems with whole‐spectrum representations. A new method of representing speech will be described which is believed to address some of the limitations of both formants and overall spectral shape. The masked peak...


Journal of the Acoustical Society of America | 1994

The role of voice pitch in the perception of glottal stops

Robert A. Houde; James Hillenbrand

Glottal stops that occur in VCV context are often not realized as stops at all, but rather show voicing that is continuous throughout the ‘‘occlusion’’ interval. Glottal stops that are realized in this way are apparently marked by reductions in intensity and fundamental frequency. In the present study recordings were made of speakers producing glottal stops in utterances such as ‘‘oh–oh’’ in which a glottal stop separates two identical vowels. The utterances were resynthesized in three ways: (1) with a flat F0 contour and an intensity contour matching the original utterance, (2) with a flat intensity contour and an F0 contour matching the original utterance, and (3) with intensity and F0 contours matching the original utterance. Results suggest that either cue alone is sufficient to signal the presence of the glottal stop. The impression of most listeners was that the signals cued by intensity alone were quite similar to those cued by F0 alone. The apparent equivalence of loudness and pitch in signaling t...


Journal of the Acoustical Society of America | 2008

Perceptual accommodation to sinewave speech.

James Hillenbrand; Michael J. Clark; Robert A. Houde; Michael W. Hillenbrand; Kathryn S. Hillenbrand

Many studies have reported good intelligibility for sine wave replicas of sentences (e.g., R. Remez et al., Science 212, 947–950 (1981)]. Recent work, however, has shown poor intelligibility (∼55%) for vowels in isolated syllables [J. Hillenbrand and M. Clark, J. Acoust. Soc. Am. 123, 3326 (2008)]. While enhanced intelligibility for sentences undoubtedly reveals the importance of top‐down mechanisms, it is also possible that sentence‐length utterances allow listeners to make (as yet unknown) perceptual accommodations to the unfamiliar acoustic properties of sine wave speech (SWS). In this study, the intelligibility of SWS replicas of 16 vowels/diphthongs in isolated syllables (“heed,” “hid,” and “hide”) was compared to that of the same syllables when preceded by a seven‐word SWS carrier phrase (CP) spoken by the same talker. Intelligibility was ∼24 percentage points higher when the SWS syllables were preceded by the SWS CP than when the same utterances were presented in isolation. Furthermore, the effect ...


Journal of the Acoustical Society of America | 1998

A damped sinewave vocoder

James Hillenbrand; Robert A. Houde

A speech analysis‐synthesis system is described that operates by summing exponentially damped sinusoids at frequencies corresponding to spectral peaks derived from speech signal. The analysis begins with the calculation of a smoothed Fourier spectrum. A running average of spectral amplitudes is then computed for each frame over an 800‐Hz window. The running average is subtracted from the smoothed spectrum to produce a ‘‘masked’’ spectrum. The signal is resynthesized by summing exponentially damped sinusoids at frequencies corresponding to peaks in the masked spectra. If an autocorrelation‐derived periodicity measure indicates that a given analysis frame is voiced, the damped sinusoids are pulsed at a rate corresponding to the measured fundamental period. For unvoiced intervals, the damped sinusoids are pulsed on and off at random intervals. Results of a perceptual evaluation of the vocoder will be reported.


Journal of the Acoustical Society of America | 1968

Tongue‐Body Motion during Selected Speech Sounds

Robert A. Houde

The articulatory behavior of the tongue body was observed and related to the phonemic representation of speech through the use of cineradiography at 100 frames/sec. Specific points on the tongue‐body surface were identified by the attachment of small radiopaque markers. Points on the jaw and each lip were also marked. The motion of these markers was observed during the production of nonsense trisyllables constructed from /i/, /ɑ/, /u/, /b/, /g/ in the form V1 CV2 CV1, with various conditions of stress as produced by a single speaker. Target positions were identified for each point marker corresponding to each phoneme. The transitions between target positions were observed. It was determined that the principal motion between vowel target positions could be characterized by a single characteristic transition function. Deviations from this principal intertarget motion that appear to be related to perturbations in oral pressure were observed.


Journal of the Acoustical Society of America | 2017

Multi-fiber coding on the auditory nerve and the origin of critical-band masking

Robert A. Houde; James Hillenbrand; Robert T. Gayvert; John F. Houde

Understanding the physiological mechanisms that underlie the exquisite frequency discrimination abilities of listeners remains a central problem in auditory science. We describe a computational model of the cochlea and auditory nerve that was developed to evaluate the frequency analysis capabilities of a system in which the output of a basilar membrane filter, transduced into a probability-of-firing function by an inner hair cell, is encoded on the auditory nerve as the instantaneous sum of firings on a critical band of fibers surrounding that filter channel and transmitted to the central nervous system for narrow-band frequency analysis. Performance of the model on vowels over a wide range of input levels was found to be robust and accurate, comparable to the Average Localized Synchronized Rate results of Young and Sachs [J. Acoust. Soc. Am. 1979, 66, 1381-1403]. Model performance in perceptual threshold simulations was also evaluated. The model succeeded in replicating psychophysical results reported in classic studies of critical band masking.Understanding the physiological mechanisms that underlie the exquisite frequency discrimination abilities of listeners remains a central problem in auditory science. We describe a computational model of the cochlea and auditory nerve that was developed to evaluate the frequency analysis capabilities of a system in which the output of a basilar membrane filter, transduced into a probability-of-firing function by an inner hair cell, is encoded on the auditory nerve as the instantaneous sum of firings on a critical band of fibers surrounding that filter channel and transmitted to the central nervous system for narrow-band frequency analysis. Performance of the model on vowels over a wide range of input levels was found to be robust and accurate, comparable to the Average Localized Synchronized Rate results of Young and Sachs [J. Acoust. Soc. Am. 1979, 66, 1381-1403]. Model performance in perceptual threshold simulations was also evaluated. The model succeeded in replicating psychophysical results reported in...

Collaboration


Dive into the Robert A. Houde's collaboration.

Top Co-Authors

Avatar

James Hillenbrand

Western Michigan University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Michael J. Clark

Western Michigan University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

John F. Houde

University of California

View shared research outputs
Top Co-Authors

Avatar

Dale Evan Metz

National Technical Institute for the Deaf

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Nicholas Schiavetti

State University of New York System

View shared research outputs
Top Co-Authors

Avatar

Ronald W. Sitler

State University of New York System

View shared research outputs
Researchain Logo
Decentralizing Knowledge