Wayne A. Lea
Apple Inc.
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Wayne A. Lea.
Journal of the Acoustical Society of America | 1998
John F. Holzrichter; Gregory C. Burnett; Lawrence C. Ng; Wayne A. Lea
Very low power electromagnetic (EM) wave sensors are being used to measure speech articulator motions as speech is produced. Glottal tissue oscillations, jaw, tongue, soft palate, and other organs have been measured. Previously, microwave imaging (e.g., using radar sensors) appears not to have been considered for such monitoring. Glottal tissue movements detected by radar sensors correlate well with those obtained by established laboratory techniques, and have used to estimate a voiced excitation function for speech processing applications. The noninvasive access, coupled with the small size, low power, and high resolution of these new sensors, permit promising research and development applications in speech production, communication disorders, speech recognition and related topics.
IEEE Transactions on Acoustics, Speech, and Signal Processing | 1975
Wayne A. Lea; Mark F. Medress; Toby E. Skinner
Our strategy for computer understanding of speech uses prosodic features to break up continuous speech into sentences and phrases and locate stressed syllables in those phrases. The most reliable phonetic data are obtained by performing a distinguishing features analysis within the stressed syllables and by locating sibilants and other robust information in unstressed syllables. The numbers and locations of syntactic boundaries and stressed syllables are used to select likely syntactic and semantic structures, within which words are hypothesized to correspond to the partial distinguishing features matrices obtained from the segmental analyses. Portions of this strategy have been implemented and tested with hundreds of seconds of speech, involving fifteen talkers. A program for detecting syntactic boundaries from fall-rise patterns in fundamental frequency contours correctly detected ever 90 percent of all predicted boundaries. An algorithm for locating stressed syllables (from fundamental frequency contours and high-energy syllabic nuclei) correctly located the nuclei of over 85 percent of all those syllables perceived as stressed by a panel of listeners. A study of segmental analysis results obtained by several other research groups showed that phonetic recognition clearly is most successful in the stressed syllables. Procedures for classification of stressed vowels, location and classification of sibilants, and location of stops, nasals, and [r]-like sounds have been implemented. Prosodic aids to parsing and semantic analysis are being investigated.
Journal of the Acoustical Society of America | 2000
Ingo R. Titze; Brad H. Story; Gregory C. Burnett; John F. Holzrichter; Lawrence C. Ng; Wayne A. Lea
Newly developed glottographic sensors, utilizing high-frequency propagating electromagnetic waves, were compared to a well-established electroglottographic device. The comparison was made on four male subjects under different phonation conditions, including three levels of vocal fold adduction (normal, breathy, and pressed), three different registers (falsetto, chest, and fry), and two different pitches. Agreement between the sensors was always found for the glottal closure event, but for the general wave shape the agreement was better for falsetto and breathy voice than for pressed voice and vocal fry. Differences are attributed to the field patterns of the devices. Whereas the electroglottographic device can operate only in a conduction mode, the electromagnetic device can operate in either the forward scattering (diffraction) mode or in the backward scattering (reflection) mode. Results of our tests favor the diffraction mode because a more favorable angle imposed on receiving the scattered (reflected) signal did not improve the signal strength. Several observations are made on the uses of the electromagnetic sensors for operation without skin contact and possibly in an array configuration for improved spatial resolution within the glottis.
international conference on acoustics, speech, and signal processing | 1977
Mark F. Medress; Toby E. Skinner; Dean R. Kloker; Timothy Diller; Wayne A. Lea
Sperry Univac is developing a linguistically oriented computer system which recognizes natural spoken phrases and sentences without requiring extensive adjustments for individual users and without the need for each user to repeat every vocabulary word for system training. The recognition system produces a linguistic description of the unknown utterance by determining its component sound segments. Words from a dictionary, or lexicon, (represented in terms of the same linguistic segments) are then strung together and compared to the analysis segments to determine which syntactically allowed sequence of vocabulary words best matches the unknown utterance. Two data bases have been used recently for development and testing, including alphanumeric sequences constructed from a vocabulary of 36 words, and data management commands from a vocabulary of 64 words. For both data bases, average correct recognition scores were 95% for individual words, and 88% for complete sentences. Current plans for enhancing the recognition system include enlarging the vocabulary and number of recognizable sentence types, making additional use of prosodic information, and introducing phonetic verification rules and a word verification component. The ultimate goal of this development is to provide an effective sentence recognition system for natural and practical speech input to computers.
Journal of the Acoustical Society of America | 1974
Wayne A. Lea
Stressed syllables are presumed to be the most carefully articulated portions of speech, and thus the most likely to provide the reliably encoded information needed for automatic recognition of continuous speech. In conjunction with the Carnegie‐Mellon Speech Segmentation Workshop, nine research groups used different automatic techniques to segment continuous speech (31 sentences) and identify the phonetic categories or phonemes. These segmentation and classification results were evaluated according to whether major distinguishing features of each of the phonea (such as high/mid/low, front/central/back, and rounded/ unrounded for vowels, and manner of articulation for consonants) were correctly determined. Listeners were asked to classify all syllables in the speech as stressed, unstressed, or reduced, and an algorithm of automatic location of stressed syllables also was used to delimit stressed nuclei. Vowels that were perceived as stressed and/or located by the algorithm were more accurately classified ...
international conference on acoustics, speech, and signal processing | 1978
Wayne A. Lea; June E. Shoup
We have determined how the state of the art in speech recognition has advanced in recent years. We have surveyed recent accomplishments in research and system design, and have determined a number of gaps that exist in speech understanding technology. There is a limited but growing market acceptance for available isolated-word recognizers, and some uncertainty about the impact of recent advances in the understanding of spoken sentences. Future work must include new and improved system components and must make better use of language constraints and task requirements, to permit increasingly more natural man-computer interactions.
Journal of the Acoustical Society of America | 1971
Wayne A. Lea
The F0 contours of texts spoken by several talkers were studied and successfully associated with boundaries of major grammatical constituents. For one talker, a significant F0 rise (usually preceded by an F0 fall) occurred at every syntactically predicted major constituent boundary, except NP‐Verbal boundaries. Boundary positions were adjusted to account for the delay in F0 rise when constituents began with unstressed syllables. Falling (“Tune 1”) F0 contours and pauses clearly marked the boundaries of both embedded and matrix sentences in such spoken prose. Phonetically dictated pitch rises at the onsets of voicing [M. Haggard, S. Amber, and M. Callow, J. Acoust. Soc. Amer. 47, 613–617 (1969)] had to be distinguished from rises due to constituent boundaries. A computer program was implemented to detect automatically the major constituent boundaries based on such F0 patterns. Results are reported, talker dependencies are discussed, and implications for speech recognition and perceptual modeling are outlin...
international conference on acoustics, speech, and signal processing | 1984
Wayne A. Lea; Frantz Clermont
Algorithms have been implemented for finding voiced regions, estimating Fo on a perceptually-useful tone scale, detecting syllabic nuclei and delimiting the syllabic nucleus, smoothing Fo in either straightforward left-to-right form or by island-driven analysis centered on syllabic nuclei, detecting phrase boundaries from Fo valleys, and assigning stresses based on context-dependent combinations of acoustic cues. Experiments showed that island-driven smoothing of Fo contours yields more corrections and smoother, flatter contours than left-to-right smoothing yields, especially at onsets of voiced regions and in fast speech.
Journal of the Acoustical Society of America | 1973
Wayne A. Lea; Mark F. Medress; Toby E. Skinner
Automatic speech recognition is expected to be more successful when syntactically related information is incorporated into early stages of recognition. Phonemic decisions, in particular, are expected to be more accurate and less ambiguous when contextual information is considered. A computer program has been developed for detecting boundaries between syntactic constituents from fall‐rise fundamental frequency (F0) contours in connected speech. Another program detects some stressed syllables from F0 contours, intensity contours, and timing information. Resulting boundary detections and stressed syllable locations are used by a preliminary method for estimating distinctive features of some phonemes in connected speech. Vowel and consonant recognition is attempted first in the stressed syllables. Other readily detected segments, such as coronal unvoiced strident fricatives, are also found. Detected constituent boundary positions are compared with stored information about boundary positions, and are used both...
Journal of the Acoustical Society of America | 1973
Wayne A. Lea
At vowel onset following unvoiced consonants in /həCVC/ utterances spoken by two talkers, F0 began high (30% higher than in /ə/), and fell about 7% in the first 5 csec. At closure of voiced oral obstruents, F0 suddenly dipped about 10%, remained flat, suddenly rose about 25% at opening of closure, and, after vowel onset, gradually rose (an average of 8% in the first 10 csec). The high/low feature of the vowel and the manner and place of prevocalic consonant articulation had progressively less effect on vowel F0 values. The final consonant had no apparent effect on F0 contours in the vowel. As previous synthesis work has suggested, the fall or rise of F0 in the initial portion of a vowel appears to be a cue to the state of voicing of previous consonants. Initial and peak F0 values in the vowel also can indicate state of consonant voicing. However, F0 contours in bisyllabic words with contrasting stress patterns and similar phonemic sequences (e.g., permit, permit) showed that (1) an initially falling F0 in...