Edward T. Auer | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Edward T. Auer is active.

Explore More

Publication

Featured researches published by Edward T. Auer.

Speech Communication | 2004

Auditory speech detection in noise enhanced by lipreading

Lynne E. Bernstein; Edward T. Auer; Sumiko Takayanagi

Abstract Audiovisual speech stimuli have been shown to produce a variety of perceptual phenomena. Enhanced detectability of acoustic speech in noise, when the talker can also be seen, is one of those phenomena. This study investigated whether this enhancement effect is specific to visual speech stimuli or can rely on more generic non-speech visual stimulus properties. Speech detection thresholds for an auditory /ba/ stimulus were obtained in a white noise masker. The auditory /ba/ was presented adaptively to obtain its 79.4% detection threshold under five conditions. In Experiment 1, the syllable was presented (1) auditory-only (AO) and (2) as audiovisual speech (AVS), using the original video recording. Three types of synthetic visual stimuli were also paired synchronously with the audio token: (3) A dynamic Lissajous (AVL) figure whose vertical extent was correlated with the acoustic speech envelope; (4) a dynamic rectangle (AVR) whose horizontal extent was correlated with the speech envelope; and (5) a static rectangle (AVSR) whose onset and offset were synchronous with the acoustic speech onset and offset. Ten adults with normal hearing and vision participated. The results, in terms of dB signal-to-noise ratio (SNR), were AVS

Journal of the Acoustical Society of America | 1997

Speechreading and the structure of the lexicon: Computationally modeling the effects of reduced phonetic distinctiveness on lexical uniqueness

Edward T. Auer; Lynne E. Bernstein

A lexical modeling methodology was employed to examine how the distribution of phonemic patterns in the lexicon constrains lexical equivalence under conditions of reduced phonetic distinctiveness experienced by speech-readers. The technique involved (1) selection of a phonemically transcribed machine-readable lexical database, (2) definition of transcription rules based on measures of phonetic similarity, (3) application of the transcription rules to a lexical database and formation of lexical equivalence classes, and (4) computation of three metrics to examine the transcribed lexicon. The metric percent words unique demonstrated that the distribution of words in the language substantially preserves lexical uniqueness across a wide range in the number of potentially available phonemic distinctions. Expected class size demonstrated that if at least 12 phonemic equivalence classes were available, any given word would be highly similar to only a few other words. Percent information extracted (PIE) [D. Carter, Comput. Speech Lang. 2, 1-11 (1987)] provided evidence that high-frequency words tend not to reside in the same lexical equivalence classes as other high-frequency words. The steepness of the functions obtained for each metric shows that small increments in the number of visually perceptible phonemic distinctions can result in substantial changes in lexical uniqueness.

Journal of the Acoustical Society of America | 1998

Temporal and spatio-temporal vibrotactile displays for voice fundamental frequency: an initial evaluation of a new vibrotactile speech perception aid with normal-hearing and hearing-impaired individuals.

Edward T. Auer; Lynne E. Bernstein; David C. Coulter

Four experiments were performed to evaluate a new wearable vibrotactile speech perception aid that extracts fundamental frequency (F0) and displays the extracted F0 as a single-channel temporal or an eight-channel spatio-temporal stimulus. Specifically, we investigated the perception of intonation (i.e., question versus statement) and emphatic stress (i.e., stress on the first, second, or third word) under Visual-Alone (VA), Visual-Tactile (VT), and Tactile-Alone (TA) conditions and compared performance using the temporal and spatio-temporal vibrotactile display. Subjects were adults with normal hearing in experiments I-III and adults with severe to profound hearing impairments in experiment IV. Both versions of the vibrotactile speech perception aid successfully conveyed intonation. Vibrotactile stress information was successfully conveyed, but vibrotactile stress information did not enhance performance in VT conditions beyond performance in VA conditions. In experiment III, which involved only intonation identification, a reliable advantage for the spatio-temporal display was obtained. Differences between subject groups were obtained for intonation identification, with more accurate VT performance by those with normal hearing. Possible effects of long-term hearing status are discussed.

Journal of the Acoustical Society of America | 2000

Development of a facility for simultaneous recordings of acoustic, optical (3‐D motion and video), and physiological speech data

Lynne E. Bernstein; Edward T. Auer; Brian Chaney; Abeer Alwan; Patricia A. Keating

A multidisciplinary, multilaboratory project is underway whose focus is optical and acoustic phonetic signals and their relationships to each other in speech production and perception. The goals are to quantitatively characterize optical speech signals, examine how optical phonetic characteristics relate to acoustic and to physiologic speech production characteristics, study what affects the intelligibility of optical speech signals, and apply obtained knowledge to optical speech synthesis and automatic speech recognition. We describe the data acquisition facility, which includes recording of video, acoustic, electromagnetic midsaggital articulography, and 3‐D motion capture data simultaneously and in synchrony. We also outline the design of the database of recordings being obtained and show examples from the database. [Work supported by NSF 9996088.]

Archive | 1996

Word Recognition in Speechreading

Lynne E. Bernstein; Edward T. Auer

The customary approaches to research on human speechreading are to study phonetic perception with identification of phonemes in nonsense syllables, and to study perception of connected discourse with identification of words in isolated sentences. These two approaches imply a theory that accounts for speechreading in terms of bottom-up phoneme perception and top-down syntactic/semantic processes. However, in auditory spoken language understanding research, word recognition is widely accepted as the critical interface between phonetic input processing and semantic/syntactic processing. In this chapter, a general theoretical approach to word recognition based on auditory speech perception is described, and it is argued that the same model should hold for speechreading. Analyses of two speechreading databases (Bernstein, Demorest, & Tucker, 1995; Demorest, Bernstein, & DeHaven, 1995) were examined for evidence that word recognition does play a critical role in speechreading. The results show high associations between the ability to speechread isolated words and words in sentences, but only low-to-moderate associations between phoneme and word identification (i.e., for words in isolation or in sentences). One implication of these results is that the course of mental events in phoneme identification is at best a subset of those events that result in word recognition. More importantly, the results support the suggestion that word recognition deserves increased attention in efforts to understand speechreading.

Journal of the Acoustical Society of America | 1996

Relationships between word knowledge and visual speech perception. II. Subjective ratings of word familiarity

Robin S. Waldstein; Edward T. Auer; Paula E. Tucker; Lynne E. Bernstein

Word familiarity is an important factor in word recognition and lexical access for hearing individuals. Subjective word familiarity ratings are hypothesized to reflect experience with words irrespective of the modality (i.e., spoken or written) through which exposure has taken place, and to provide an estimate of the size of the mental lexicon. To investigate how word familiarity is related to lipreading proficiency, 450 printed words were presented for rating on a seven‐point scale to 50 deaf and 50 hearing participants. Preliminary results revealed that the deaf participants produced lower mean familiarity ratings than did the hearing participants, for high‐, medium‐, and low‐familiarity words (hearing means= 6.7, 4.8, 3.0; deaf means= 6.0, 3.8, 2.6). Among the deaf participants, correlations between established familiarity ratings and individuals’ ratings were reliably higher for excellent than for good lipreaders, a possible indication that perceptual experience influences the structure of the lexicon. At the same time, the performance of the excellent lipreaders provides support for the hypothesis that lexical organization does not depend on the perceptual input modality (i.e., vision versus hearing). [Work supported by NIH Grant No. DC00695.]

Journal of the Acoustical Society of America | 1996

Relationships between word knowledge and visual speech perception. I. Subjective estimates of word age of acquisition

Edward T. Auer; Robin S. Waldstein; Paula E. Tucker; Lynne E. Bernstein

In individuals with normal hearing, words estimated to be learned earlier are recognized more rapidly than words estimated to be learned later. To investigate how word knowledge is related to lipreading proficiency, word age‐of‐acquisition (AOA) estimates were obtained from 50 hearing (H) and 50 deaf (D) (80‐dB HL pure‐tone average or greater hearing losses acquired before the age of 48 months) adults. Participants judged AOA for the 175 words in Form M of the Peabody Picture Vocabulary Test‐Revised using an 11‐point scale, and responded whether the words were acquired through speech, sign language, or orthography. The two groups differed in when (mean AOA: H = 8.9 years, D = 10.6 years) and how (H=69% speech and 31% orthography; D=38% speech, 45% orthography, and 17% sign language) words were judged to be acquired. However, item analyses revealed that the relative acquisition order was essentially identical across groups (r=0.965). Interestingly, within the deaf group, better lipreaders estimated that mo...

Journal of the Acoustical Society of America | 2004

Optical phonetics and visual perception of lexical and phrasal boundaries in English

Edward T. Auer; Sahyang Kim; Patricia A. Keating; Rebecca Scarborough; Abeer Alwaan; Lynne E. Bernstein

Detection of lexical and phrasal boundaries in the speech signal is crucial to successful comprehension. Several suprasegmental acoustic cues have been identified for boundary detection in speech (e.g., stress pattern, duration, and pitch). However, the corresponding optical cues have not been described in detail, and it is not known to which optical cues to boundaries speechreaders are attuned. In a production study, three male American English talkers spoke two repetitions each of eight pairs of sentences in two boundary conditions (one‐word versus two‐word sequences, one‐phrase versus two‐phrase sequences). Sentence pairs were constructed such that they differed minimally in the presence of a boundary. Audio, video, and 3D‐movements of the face were recorded. Sentence pairs in both boundary conditions differed on both optical and acoustic duration measurements. A subset of the sentence pairs was also presented in a visual perception study. Pairs were chosen a priori because they differed or did not dif...

Journal of the Acoustical Society of America | 1997

A comparison of perceptual word similarity metrics

Paul Iverson; Edward T. Auer; Lynne E. Bernstein

Contemporary theories of spoken‐word recognition rely on the notion that the stimulus word is mapped against, or selected from, words in long‐term memory in terms of its phonetic (form‐based) attributes. A few metrics have been proposed to model the form‐based similarity of words, including an abstract phonemic metric computed directly on the lexicon (i.e., Coltheart‐n), and perceptual metrics derived from the results of phoneme identification experiments. The results of applying several different metrics to phoneme and word identification data (open‐set and forced‐choice tasks) will be discussed, and these metrics across stimulus conditions with a range of intelligibility levels and similarity structures (visual‐only lipreading, audio‐only conditions processed by a vocoder, and audiovisual conditions pairing vocoded audio with lipreading) will be compared. The our results suggest that graded perceptual metrics may be most useful for understanding the results of word identification experiments across a wide range of stimulus conditions. [Work supported by NIH DC00695.]

Journal of the Acoustical Society of America | 1997

The scope of individual differences in cognitive models of spoken language understanding

Edward T. Auer

A model of spoken language understanding capable of accounting for data at the level of the individual should include: (1) a functional architecture and its processes; (2) specifications of how experience and development affect the architecture and its processes; and (3) knowledge about the range of variability that can be observed for the components of the system. Within this framework, individual differences are hypothesized to arise from the interaction of experience and biologically specified abilities. These individual differences result in systematic variation of performance in experimental tasks that can be identified with specific locations in the functional architecture and/or subprocesses. Taking into account variation due to individual differences, a model also should scale across normal and impaired populations. Evidence suggestive of the utility of this approach for understanding/predicting the performance of individuals will be presented from studies of speechreading, tactile‐aid use, and co...

Explore More