Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Philip F. Seitz is active.

Publication


Featured researches published by Philip F. Seitz.


Journal of the Acoustical Society of America | 1997

The recognition of isolated words and words in sentences: Individual variability in the use of sentence context

Ken W. Grant; Philip F. Seitz

Auditory–Visual (AV) speech recognition is influenced by at least three primary factors: (1) the ability to extract auditory (A) and visual (V) cues; (2) the ability to integrate these cues into a single linguistic object; and (3) the ability to use semantic and syntactic constraints available within the context of a sentence. In this study, the ability of hearing‐impaired individuals to recognize bandpass filtered words presented in isolation and in meaningful sentences was evaluated. Sentence materials were constructed by concatenating digitized productions of isolated words to ensure physical equivalence among the test items in the two conditions. Formulae for calculating k factors [Boothroyd and Nittrouer, J. Acoust. Soc. Am. 84, 101–114 (1988)], which relate scores for words and sentences, were applied to individual subject data obtained at three levels of isolated word‐recognition performance approximating 30%, 50%, and 70% correct. In addition, A, V, and AV sentence recognition in noise was evaluated using natural productions of fluent speech. Two main issues are addressed: (1) the effects of intelligibility on estimates of k within individual subjects; and (2) the relations between individual estimates of k and sentence recognition in noise as a function of presentation modality. [Work supported by NIH Grant DC00792.]


Journal of the Acoustical Society of America | 1998

The use of visible speech cues (speechreading) for directing auditory attention: Reducing temporal and spectral uncertainty in auditory detection of spoken sentences

Ken W. Grant; Philip F. Seitz

It is well established that auditory‐visual speech recognition is far superior to auditory‐only speech recognition. Classic accounts of the benefits of speechreading to speech recognition treat auditory and visual channels as independent sources of information that are integrated early in the speech perception process, most likely at a precategorical stage. The question addressed in this study was whether visible movements of the speech articulators could be used to improve the detection of speech in noise, thus demonstrating an influence of speechreading on the processing of low‐level auditory cues. Subjects were required to detect the presence of spoken sentences in noise under three conditions: auditory‐only, auditory‐visual with a visually matched sentence, and auditory‐visual with a visually unmatched sentence. The potential benefits of congruent visual speech cues to auditory detection will be discussed in terms of a reduction of signal uncertainty, in both temporal and spectral domains. In effect, ...


Computer Speech & Language | 1990

A dictionary for a very large vocabulary word recognition system

Philip F. Seitz; Vishwa Gupta; Matthew Lennig; Patrick Kenny; Li Deng; Douglas D. O'Shaughnessy; Paul Mermelstein

Abstract It is not too difficult to select a fairly small (on the order of 20 000 words) fixed recognition vocabulary that will cover over 99% of new input words when the task is limited to text in a specific knowledge domain and when one disregards names and acronyms. Achieving such a level of coverage is much more difficult when restrictions on knowledge domain and names are lifted, however. This report describes how we selected a 75 000-word English recognition vocabulary that covers over 98% of words in new newspaper text, including names and acronyms. Observations collected during the vocabulary selection process indicate the limiting factors for coverage of general knowledge domain text such as newspaper stories.


Computer Speech & Language | 1992

Use of minimum duration and energy contour for phonemes to improve large vocabulary isolated-word recognition☆

Vishwa Gupta; Matthew Lennig; Paul Mermelstein; Patrick Kenny; Philip F. Seitz; Douglas D. O'Shaughnessy

Many acoustic misrecognitions in our 86 000-word speaker-trained isolated-word recognizer are due to phonemic hidden Markov models (phoneme models) mapping to short segments of speech. When we force these models to map to longer segments corresponding to the observed minimum durations for the phonemes, then the likelihood of the incorrect phoneme sequences drops dramatically. This drop in the likelihood of the incorrect words results in significant reduction in the acoustic recognition1 error rate. Even in cases where acoustic recognition performance is unchanged, the likelihood of the correct word choice improves relative to the incorrect word choices, resulting in significant reduction in recognition error rate with the language model. On nine speakers, the error rate for acoustic recognition reduces from 18·6 to 17·3%, while the error rate with the language model reduces from 9·2 to 7·2%. We have also improved the phoneme models by correcting the segmentation of the phonemes in the training set. During training, the boundaries between phonemes are not marked accurately. We use energy to correct these boundaries. Application of an energy threshold improves the segment boundaries between stops and sonorants (vowels, liquids and glides), between fricatives and sonorants, between affricates and sonorants and between breath noise and sonorants. Training the phoneme models with these segmented phonemes results in models which increase recognition accuracy significantly. On two speakers, the error rate for acoustic recognition reduces from 26·5 to 23·1%, while the error rate with the language model reduces from 11·3 to 8·8%. This reduction in error rate is in addition to the error rate reductions obtained by imposing minimum duration constraints. The overall reduction in errors for these two speakers using minimum durations and energy thresholds is from 27·3 to 23·1% for acoustic recognition, and from 14·3 to 8·8% with the language model.


Journal of the Acoustical Society of America | 1997

Hearing‐impaired perceivers’ encoding and retrieval speeds for auditory, visual, and audiovisual spoken words

Philip F. Seitz

Perceptual encoding and memory retrieval processing speeds were assessed for spoken words in 26 subjects, mean age 66, with mild to moderate acquired sensorineural hearing loss. Subjects were trained to achieve error‐free recognition of a set of ten spoken words in auditory, visual (speech reading), and audiovisual conditions. They then performed the Sternberg item recognition task in each of the modality conditions using the same set of ten words. The task involved presenting memory sets of one to four words, followed by a probe word to which subjects made a speeded ‘‘YES’’ or ‘‘NO’’ button response to indicate whether the probe matched any of the memory set items. Least‐squares linear models provided good fits to subjects’ memory‐set size by reaction time functions (mean r2>0.90 for all three conditions). Using the models’ intercepts and slopes to represent encoding and retrieval times, respectively, Wilcoxon tests showed significant differences among the conditions with respect to both encoding and ret...


Journal of the Acoustical Society of America | 1993

Stimulus materials for audio‐visual studies of attention and speech perception by the hearing impaired

Brad Rakerd; Philip F. Seitz

Individuals with sensorineural hearing impairment have been shown to devote a larger than normal share of their attention to speech processing when listening in situations that afford access to audio speech cues only. Attentional commitments may prove to be more nearly normal if both audio and visual speech cues are available. To test this possibility, a studio‐quality videotape has been developed for use in primary‐task and dual‐task studies of attention and speech perception. Markers on the tape make it possible to maintain coordination with a computer. The timing of speech events is specified to within a few milliseconds, potentiating subject reaction time measurements. There are three sets of stimulus materials: (i) 30 to 45s samples of connected discourse; (ii) triplets of phonetically balanced monosyllabic words, arranged in an ABX format for discrimination experiments; and (iii) randomized lists of words and phonetically matched nonwords for lexical decision experiments. Copies of the videotape, al...


Journal of the Acoustical Society of America | 1996

Hearing impairment and same–different reaction time.

Philip F. Seitz; Brad Rakerd

Reaction time (RT) studies provide information about perceptual and post‐perceptual information processing. A prior investigation found only small differences between the mean RTs of hearing‐impaired (HI) and normal‐hearing (NH) subjects performing a simple RT task with subjectively loud and soft tonal stimuli. The present study extended that group comparison to a choice RT task. Subjects with early‐onset, moderate‐to‐severe sensorineural hearing impairments (N=8) and NH controls (N=14) made same–different judgments about digit pairs. In an auditory condition spoken digits were presented at two levels of loudness, representing the endpoints of a subject’s dynamic range. In a visual condition digits were shown at the dimmest and brightest settings of a computer monitor. All subjects performed accurately in both modalities (≳90% correct). Notable findings regarding RT were as follows: (1) The intensity variations had no significant effect for either group; (2) the two groups had comparable RTs in the visual...


Journal of the Acoustical Society of America | 1993

Effects of hearing differences on encoding and comparison information‐processing stages

Philip F. Seitz; Brad Rakerd; Paula E. Tucker

Reaction times to spoken digits presented in a speeded memory scanning procedure were measured for groups of listeners with normal hearing (N=24) and with congenital or early‐onset sensorineural hearing losses (N=12). Separate groups of 12 normal‐hearing listeners were tested under conditions of good stimulus quality (no distortion, high SNR) versus poor stimulus quality (low‐pass filtered, low SNR). The speeded memory scanning procedure allows total reaction time to be decomposed into ‘‘encoding’’ and ‘‘comparison’’ components which correspond to separate stages in a model of human information processing. At issue here is whether long‐term effects of hearing loss, such as possible deficits in phonological and/or lexical representation of spoken language, lead to unusual processing costs at either the encoding or comparison stage. Experimental results suggest that impaired listeners incur the same encoding costs as normal‐hearing listeners presented with poor‐quality stimuli. However, comparison costs for...


Journal of the Acoustical Society of America | 1990

Phonological rule set complexity as a factor in the performance of a very large vocabulary automatic word recognition system

Philip F. Seitz; Vishwa Gupta; Matthew Lennig; Patrick Kenny; Li Deng; Douglas D. O'Shaughnessy; Paul Mermelstein

Generative phonological rules were used to derive surface forms of words for the 89 000‐word lexicon of the INRS‐Telecommunications speaker‐dependent, isolated word recognizer. The representation of the surface forms is in terms of the 41 phonemelike subword recognition units that are trained and recognized using 25‐component continuous mixture hidden Markov models. Three sets of surface forms were generated by three phonological rule sets that differ in the amount of phonetic and phonological variability they represent. The three dictionaries thus derived were of different sizes, having ratios of surface to base forms of 1.04, 1.23, and 2.40. Recognition experiments were run on the speech of one talker. Representing pronunciation variations explicitly in the dictionary affords a recognition performance advantage, but this advantage is partially offset by a higher ratio of surface to base forms, which leads to a larger and more densely populated search space and greater confusability of subword unit seque...


Journal of the Acoustical Society of America | 1999

Benefits of nonlinear active gain for detection of brief probes in Schroeder‐phase complex maskers

Van Summers; Philip F. Seitz

Masking period patterns for positive and negative Schroeder‐phase complexes were determined for tonal probes at 1 and 4 kHz. Maskers included components from 200–5000 Hz of a 100‐Hz fundamental. The 5‐ms probes had onsets occurring 153, 155.5, 158, 160.5, or 163 ms following masker onset. For listeners with normal hearing, thresholds in the positive Schroeder‐phase masker varied by as much as 25 dB depending on the position of the probe within the masker period. Thresholds in the negative Schroeder‐phase maskers were nearly constant as probe onset varied. The maskers were more nearly equal in effectiveness for listeners with sensorineural hearing loss and in testing at high presentation levels. The findings support an interpretation involving differences in the shape of the basilar‐membrane waveform generated by each masker, and influences of nonlinear active gain on these internal responses. The positive Schroeder‐phase masker appears to produce a highly modulated basilar membrane response. Nonlinear act...

Collaboration


Dive into the Philip F. Seitz's collaboration.

Top Co-Authors

Avatar

Brad Rakerd

Michigan State University

View shared research outputs
Top Co-Authors

Avatar

Ken W. Grant

Walter Reed Army Medical Center

View shared research outputs
Top Co-Authors

Avatar

Matthew Lennig

Institut national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Douglas D. O'Shaughnessy

Institut national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Patrick Kenny

Institut national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Vishwa Gupta

Institut national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge