John J. Godfrey
Texas Instruments
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by John J. Godfrey.
international conference on acoustics, speech, and signal processing | 1992
John J. Godfrey; Edward Holliman; Jane McDaniel
SWITCHBOARD is a large multispeaker corpus of conversational speech and text which should be of interest to researchers in speaker authentication and large vocabulary speech recognition. About 2500 conversations by 500 speakers from around the US were collected automatically over T1 lines at Texas Instruments. Designed for training and testing of a variety of speech processing algorithms, especially in speaker verification, it has over an 1 h of speech from each of 50 speakers, and several minutes each from hundreds of others. A time-aligned word for word transcription accompanies each recording.<<ETX>>
human language technology | 1990
Charles T. Hemphill; John J. Godfrey; George R. Doddington
Speech research has made tremendous progress in the past using the following paradigm:• define the research problem,• collect a corpus to objectively measure progress, and• solve the research problem.Natural language research, on the other hand, has typically progressed without the benefit of any corpus of data with which to test research hypotheses. We describe the Air Travel Information System (ATIS) pilot corpus, a corpus designed to measure progress in Spoken Language Systems that include both a speech and natural language component. This pilot marks the first full-scale attempt to collect such a corpus and provides guidelines for future efforts.
international conference on acoustics, speech, and signal processing | 1992
Barbara Wheatley; George R. Doddington; Charles T. Hemphill; John J. Godfrey; Edward Holliman; Jane McDaniel; Drew Fisher
A method for automatic time alignment of orthographically transcribed speech using supervised speaker-independent automatic speech recognition based on the orthographic transcription, an online dictionary, and HMM phone models is presented. This method successfully aligns transcriptions with speech in unconstrained 5 to 10 min conversations collected over long-distance telephone lines. It requires minimal manual processing and generally produces correct alignments despite the challenging nature of the data. The robustness and efficiency of the method make it a practical tool for very large speech corpora.<<ETX>>
Journal of the Acoustical Society of America | 2004
Yifan Gong; John J. Godfrey
On improved transformation method uses an initial set of Hidden Markov Models (HMMs) trained on a large amount of speech recorded in a low noise environment R to provide rich information on co-articulation and speaker variation and a smaller database in a more noisy target environment T. A set H of HMMs is trained with data provided in the low noise environment R and the utterances in the noisy environment T are transcribed phonetically using set H of HMMs. The transcribed segments are grouped into a set of Classes C. For each subclass c of Classes C, the transformation Φc is found to maximize likelihood utterances in T, given H. The HMMs are transformed and steps repeated until likelihood stabilizes.
international conference on acoustics speech and signal processing | 1999
Yifan Gong; John J. Godfrey
In the absence of HMMs trained with speech collected in the target environment, one may use HMMs trained with a large amount of speech collected in another recording condition (e.g., quiet office, with high quality microphone). However, this may result in poor performance because of the mismatch between the two acoustic conditions. We propose a linear regression-based model adaptation procedure to reduce such a mismatch. With some adaptation utterances collected for the target environment, the procedure transforms the HMMs trained in a quiet condition to maximize the likelihood of observing the adaptation utterances. The transformation must be designed to maintain speaker-independence of the HMM. Our speaker-independent test results show that with this procedure about 1% digit error rate can be achieved for hands-free recognition, using target environment speech from only 20 speakers.
international conference on acoustics, speech, and signal processing | 1997
John J. Godfrey; Aravind Ganapathiraju; Coimbatore S. Ramalingam; Joseph Picone
By building acoustic phonetic models which explicitly represent as much knowledge of pronunciation in a small domain (the digits) as possible, we can create a recognition system which not only performs well but allows for meaningful error analysis and improvement. An HMM-based recognizer for the digits and a few associated words was constructed in accord with these principles. About 65 phonetic models were trained on 140 carefully labeled utterances, then iteratively trained on unlabeled data under orthographic supervision. The basic system achieved less than 3% word error rate on digit strings of unknown length from unseen test speakers, and 1.4% on 7-digit strings of known length. This is competitive with word-based models using the same HMM engine and similar parameter settings. As an R&D system, it allows meaningful analysis of errors and relatively straightforward means of improvement.
Journal of the Acoustical Society of America | 1973
John J. Godfrey
An experiment was designed to test the hypothesis that the right‐ear advantage for speech in dichotic listening is correlated directly with perceptual difficulty, whether due to an inherent property of the stimulus (such as the “encodedness” of consonants) or to other loading factors in the perceptual situation. Brief vowel excerpts were presented dichotically, and perceptual difficulty allowed to vary along three dimensions: duration of stimuli, S/N ratio, and phonetic discriminability among vowels. Significant right‐ear advantages were indeed found, although not entirely in accord with the original hypothesis. Results are discussed in terms of both the degree of perceptual difficulty and the speech‐processing mode of perception.
Journal of the Acoustical Society of America | 1974
John J. Godfrey
In a previous paper presented to the Society [J. Godfrey, J. Acoust. Soc. Am. 54, 285(A) (1973)], short vowel excerpts were shown to produce a right ear advantage (REA) of 4% to 8%, when the task was made sufficiently difficult. One parameter of “perceptual difficulty” explored was duration: whereas previous experiments with 300‐msec vowels [Shankweiler and Studdert‐Kennedy, Quart. J. Exp. Psychol. 19, 59–69 (1967)] produced no REA, the much shorter stimuli (10–60 msec) of my experiment did. To investigate further the relationship between stimulus duration and REA, a dichotic listening experiment was conducted using vowels of 50‐, 100‐, and 150‐msec duration. Stimuli were sets of tense and lax vowels, presented with an S/N ratio of 0 dB. Results are discussed in terms of the “perceptual difficulty” hypothesis, i.e., that the REA is produced by the perceptual difficulty of the speech perception task, and that duration, within a certain range, constitutes one parameter of such difficulty.
human language technology | 1989
John J. Godfrey; R. Rajasekaran
The Speech Data Base program is in the second phase of a two phase R&D program. In the first phase, we developed speech data bases that fueled and supported the DARPA SCP continuous speech recognition efforts. In cooperation with MIT, an acoustic phonetic base [1] was created consisting of 10 spoken English sentences by each of 630 speakers (dialectically balanced). This data base provided seeds for the development of several speaker independent recognition systems. A resource management data base [2] reflecting query, command and control tasks in naval battle management, was established and provided the evaluation and demonstration infrastructure for the various speech recognizers developed under DARPA SCP program.
Journal of the Acoustical Society of America | 1972
John J. Godfrey
The Analog Cochlea, an electronic model of inner ear function, produces surfaces representing the input to the auditory nervous system for a given audio stimulus. Thus, for speech, surface features which remain constant for a single phoneme or distinctive feature across speakers may be called the auditory correlates of that feature or phoneme. Some features have been reported on previously to this Society [J. R. Mundie and T. J. Moore, “Speech Analysis Using an Analog Cochlea, ” J. Acoust. Soc. Amer. 48, 131 (A) (1970 )]. A study of these auditory features of English vowels shows that they are often, but not always, isomorphic with the formants indicated by spectral analysis. This paper will compare the intra‐ and interspeaker variances in spectrographic formant measurements with those from Analog Cochlea surface features, using the same set of vowels pronounced by the same speakers of a single dialect.