Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ken W. Grant is active.

Publication


Featured researches published by Ken W. Grant.


Journal of the Acoustical Society of America | 2000

The use of visible speech cues for improving auditory detection of spoken sentences

Ken W. Grant; Philip-Franz Seitz

Classic accounts of the benefits of speechreading to speech recognition treat auditory and visual channels as independent sources of information that are integrated fairly early in the speech perception process. The primary question addressed in this study was whether visible movements of the speech articulators could be used to improve the detection of speech in noise, thus demonstrating an influence of speechreading on the ability to detect, rather than recognize, speech. In the first experiment, ten normal-hearing subjects detected the presence of three known spoken sentences in noise under three conditions: auditory-only (A), auditory plus speechreading with a visually matched sentence (AV(M)) and auditory plus speechreading with a visually unmatched sentence (AV(UM). When the speechread sentence matched the target sentence, average detection thresholds improved by about 1.6 dB relative to the auditory condition. However, the amount of threshold reduction varied significantly for the three target sentences (from 0.8 to 2.2 dB). There was no difference in detection thresholds between the AV(UM) condition and the A condition. In a second experiment, the effects of visually matched orthographic stimuli on detection thresholds was examined for the same three target sentences in six subjects who participated in the earlier experiment. When the orthographic stimuli were presented just prior to each trial, average detection thresholds improved by about 0.5 dB relative to the A condition. However, unlike the AV(M) condition, the detection improvement due to orthography was not dependent on the target sentence. Analyses of correlations between area of mouth opening and acoustic envelopes derived from selected spectral regions of each sentence (corresponding to the wide-band speech, and first, second, and third formant regions) suggested that AV(M) threshold reduction may be determined by the degree of auditory-visual temporal coherence, especially between the area of lip opening and the envelope derived from mid- to high-frequency acoustic energy. Taken together, the data (for these sentences at least) suggest that visual cues derived from the dynamic movements of the fact during speech production interact with time-aligned auditory cues to enhance sensitivity in auditory detection. The amount of visual influence depends in part on the degree of correlation between acoustic envelopes and visible movement of the articulators.


Journal of the Acoustical Society of America | 1998

Auditory-visual speech recognition by hearing-impaired subjects: Consonant recognition, sentence recognition, and auditory-visual integration

Ken W. Grant; Brian E. Walden; Philip F. Seitz

Factors leading to variability in auditory-visual (AV) speech recognition include the subjects ability to extract auditory (A) and visual (V) signal-related cues, the integration of A and V cues, and the use of phonological, syntactic, and semantic context. In this study, measures of A, V, and AV recognition of medial consonants in isolated nonsense syllables and of words in sentences were obtained in a group of 29 hearing-impaired subjects. The test materials were presented in a background of speech-shaped noise at 0-dB signal-to-noise ratio. Most subjects achieved substantial AV benefit for both sets of materials relative to A-alone recognition performance. However, there was considerable variability in AV speech recognition both in terms of the overall recognition score achieved and in the amount of audiovisual gain. To account for this variability, consonant confusions were analyzed in terms of phonetic features to determine the degree of redundancy between A and V sources of information. In addition, a measure of integration ability was derived for each subject using recently developed models of AV integration. The results indicated that (1) AV feature reception was determined primarily by visual place cues and auditory voicing + manner cues, (2) the ability to integrate A and V consonant cues varied significantly across subjects, with better integrators achieving more AV benefit, and (3) significant intra-modality correlations were found between consonant measures and sentence measures, with AV consonant scores accounting for approximately 54% of the variability observed for AV sentence recognition. Integration modeling results suggested that speechreading and AV integration training could be useful for some individuals, potentially providing as much as 26% improvement in AV consonant recognition.


Journal of the Acoustical Society of America | 2009

Auditory and auditory-visual intelligibility of speech in fluctuating maskers for normal-hearing and hearing-impaired listeners

Joshua G. Bernstein; Ken W. Grant

Speech intelligibility for audio-alone and audiovisual (AV) sentences was estimated as a function of signal-to-noise ratio (SNR) for a female target talker presented in a stationary noise, an interfering male talker, or a speech-modulated noise background, for eight hearing-impaired (HI) and five normal-hearing (NH) listeners. At the 50% keywords-correct performance level, HI listeners showed 7-12 dB less fluctuating-masker benefit (FMB) than NH listeners, consistent with previous results. Both groups showed significantly more FMB under AV than audio-alone conditions. When compared at the same stationary-noise SNR, FMB differences between listener groups and modalities were substantially smaller, suggesting that most of the FMB differences at the 50% performance level may reflect a SNR dependence of the FMB. Still, 1-5 dB of the FMB difference between listener groups remained, indicating a possible role for reduced audibility, limited spectral or temporal resolution, or an inability to use auditory source-segregation cues, in directly limiting the ability to listen in the dips of a fluctuating masker. A modified version of the extended speech-intelligibility index that predicts a larger FMB at less favorable SNRs accounted for most of the FMB differences between listener groups and modalities. Overall, these data suggest that HI listeners retain more of an ability to listen in the dips of a fluctuating masker than previously thought. Instead, the fluctuating-masker difficulties exhibited by HI listeners may derive from the reduced FMB associated with the more favorable SNRs they require to identify a reasonable proportion of the target speech.


Journal of the Acoustical Society of America | 1987

Evaluating the articulation index for auditory–visual input

Ken W. Grant; Louis D. Braida

The current standard for calculating the Articulation Index (AI) includes a procedure to estimate the effective AI when hearing is combined with speechreading [ANSI S3.5‐1969 (R1978), “Methods for the Calculation of the Articulation Index” (American National Standards Institute, New York, 1969)]. This procedure assumes that the band‐importance function derived for auditory listening situations applies equally well to auditory‐visual situations. Recent studies have shown, however, that certain auditory signals that, by themselves, produce negligible speech reception scores (e.g., F0, speech‐modulated noise, etc.) can provide substantial benefits to speechreading. The existence of such signals suggests that the band‐importance function for auditory and auditory‐visual inputs may be different. In the present study, an attempt was made to validate the auditory‐visual correction procedure outlined in the ANSI‐1969 standard by evaluating auditory, visual, and auditory‐visual sentence identification performance of normal‐hearing subjects for both wideband speech degraded by additive noise and bandpass‐filtered speech presented in quiet. The results obtained for auditory listening conditions with an AI greater than 0.03 support the procedure outlined in the current ANSI standard. [Work supported by NIH.]


Journal of the Acoustical Society of America | 1993

EVALUATING THE ARTICULATION INDEX FOR AUDITORY-VISUAL CONSONANT RECOGNITION

Ken W. Grant; Brian E. Walden

Adequacy of the ANSI standard for calculating the articulation index (AI) [ANSI S3.5‐1969 (R1986)] was evaluated by measuring auditory (A), visual (V), and auditory–visual (AV) consonant recognition under a variety of bandpass‐filtered speech conditions. Contrary to ANSI predictions, filter conditions having the same auditory AI did not necessarily result in the same auditory–visual AI. Low‐frequency bands of speech tended to provide more benefit to AV consonant recognition than high‐frequency bands. Analyses of the auditory error patterns produced by the different filter conditions showed a strong negative correlation between the degree of A and V redundancy and the amount of benefit obtained when A and V cues were combined. These data indicate that the ANSI auditory–visual AI procedure is inadequate for predicting AV consonant recognition performance under conditions of severe spectral shaping.


Journal of the Acoustical Society of America | 1985

The contribution of fundamental frequency, amplitude envelope, and voicing duration cues to speechreading in normal‐hearing subjects

Ken W. Grant; LeeAnn H. Ardell; Patricia K. Kuhl; David W. Sparks

The ability to combine speechreading (i.e., lipreading) with prosodic information extracted from the low-frequency regions of speech was evaluated with three normally hearing subjects. The subjects were tested in a connected discourse tracking procedure which measures the rate at which spoken text can be repeated back without any errors. Receptive conditions included speechreading alone (SA), speechreading plus amplitude envelope cues (AM), speechreading plus fundamental frequency cues (FM), and speechreading plus intensity-modulated fundamental frequency cues (AM + FM). In a second experiment, one subject was further tested in a speechreading plus voicing duration cue condition (DUR). Speechreading performance was best in the AM + FM condition (83.6 words per minute,) and worst in the SA condition (41.1 words per minute). Tracking levels in the AM, FM, and DUR conditions were 73.7, 73.6, and 65.4 words per minute, respectively. The average tracking rate obtained when subjects were allowed to listen to the talkers normal (unfiltered) speech (NS condition) was 108.3 words per minute. These results demonstrate that speechreaders can use information related to the rhythm, stress, and intonation patterns of speech to improve their speechreading performance.


Journal of the Acoustical Society of America | 2001

The effect of speechreading on masked detection thresholds for filtered speech

Ken W. Grant

Detection thresholds for spoken sentences in steady-state noise are reduced by 1–3 dB when synchronized video images of movements of the lips and other surface features of the face are provided. An earlier study [K. W. Grant and P. F. Seitz, J. Acoust. Soc. Am. 108, 1197–1208 (2000)], showed that the amount of masked threshold reduction, or bimodal coherence masking protection (BCMP), was related to the degree of correlation between the rms amplitude envelope of the target sentence and the area of lip opening, especially in the mid-to-high frequencies typically associated with the second (F2) and third (F3) speech formants. In the present study, these results are extended by manipulating the cross-modality correlation through bandpass filtering. Two filter conditions were tested corresponding roughly to the first and second speech formants: F1 (100–800 Hz) and F2 (800–2200 Hz). Results for F2-filtered target sentences were comparable to those of unfiltered speech, yielding a BCMP of roughly 2–3 dB. Result...


Journal of the Acoustical Society of America | 1997

The recognition of isolated words and words in sentences: Individual variability in the use of sentence context

Ken W. Grant; Philip F. Seitz

Auditory–Visual (AV) speech recognition is influenced by at least three primary factors: (1) the ability to extract auditory (A) and visual (V) cues; (2) the ability to integrate these cues into a single linguistic object; and (3) the ability to use semantic and syntactic constraints available within the context of a sentence. In this study, the ability of hearing‐impaired individuals to recognize bandpass filtered words presented in isolation and in meaningful sentences was evaluated. Sentence materials were constructed by concatenating digitized productions of isolated words to ensure physical equivalence among the test items in the two conditions. Formulae for calculating k factors [Boothroyd and Nittrouer, J. Acoust. Soc. Am. 84, 101–114 (1988)], which relate scores for words and sentences, were applied to individual subject data obtained at three levels of isolated word‐recognition performance approximating 30%, 50%, and 70% correct. In addition, A, V, and AV sentence recognition in noise was evaluated using natural productions of fluent speech. Two main issues are addressed: (1) the effects of intelligibility on estimates of k within individual subjects; and (2) the relations between individual estimates of k and sentence recognition in noise as a function of presentation modality. [Work supported by NIH Grant DC00792.]


Journal of the Acoustical Society of America | 1998

Modulation rate detection and discrimination by normal-hearing and hearing-impaired listeners

Ken W. Grant; Van Summers; Marjorie R. Leek

Modulation detection and modulation rate discrimination thresholds were obtained at three different modulation rates (fm = 80, 160, and 320 Hz) and for three different ranges of modulation depths (m): full (100%), mid (70%-80%), and low (40%-60%) with both normal-hearing (NH) and hearing-impaired (HI) subjects. The results showed that modulation detection thresholds increased with modulation rate, but significantly more so for HI than for NH subjects. Similarly, rate discrimination thresholds (delta r) increased with increases in fm and decreases in modulation depth. When compared to NH subjects, rate discrimination thresholds for HI subjects were significantly worse for all rates and for all depths. At the fastest modulation rate with less than 100% modulation depth, most HI subjects could not discriminate any change in rate. When valid thresholds for rate discrimination were obtained for HI subjects, they ranged from 2.5 semitones (delta r = 12.7 Hz, fm = 80 Hz, m = 100%) to 8.7 semitones (delta r = 214.5 Hz, fm = 320 Hz, m = 100%). In contrast, average rate discrimination thresholds for NH subjects ranged from 0.9 semitones (delta r = 4.2 Hz, fm = 80 Hz, m = 100%) to 4.7 semitones (delta r = 103.5 Hz, fm = 320 Hz, m = 60%). Some of the differences in temporal processing between NH and HI subjects, especially those related to modulation detection, may be accounted for by differences in signal audibility, especially for high-frequency portions of the modulated noise. However, in many cases, HI subjects encountered great difficulty discriminating a change in modulation rate even though the modulation components of the standard and test stimuli were detectable.


Journal of the Acoustical Society of America | 2002

Measures of auditory-visual integration for speech understanding: A theoretical perspective (L)

Ken W. Grant

Recent studies of auditory-visual integration have reached diametrically opposed conclusions as to whether individuals differ in their ability to integrate auditory and visual speech cues. A study by Massaro and Cohen [J. Acoust. Soc. Am. 108(2), 784–789 (2000)] reported that individuals are essentially equivalent in their ability to integrate auditory and visual speech information, whereas a study by Grant and Seitz [J. Acoust. Soc. Am. 104(4), 2438–2450 (1998)] reported substantial variability across subjects in auditory-visual integration for both sentences and nonsense syllables. This letter discusses issues related to the measurement of auditory-visual integration and modeling efforts employed to separate information extraction from information processing.

Collaboration


Dive into the Ken W. Grant's collaboration.

Top Co-Authors

Avatar

Brian E. Walden

Walter Reed Army Medical Center

View shared research outputs
Top Co-Authors

Avatar

Van Summers

Walter Reed Army Institute of Research

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Philip F. Seitz

Institut national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Mary T. Cord

Walter Reed Army Medical Center

View shared research outputs
Top Co-Authors

Avatar

Rauna K. Surr

Walter Reed Army Medical Center

View shared research outputs
Top Co-Authors

Avatar

Steven Greenberg

International Computer Science Institute

View shared research outputs
Top Co-Authors

Avatar

Joshua G. Bernstein

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Louis D. Braida

Massachusetts Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge