Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where George R. Doddington is active.

Publication


Featured researches published by George R. Doddington.


human language technology | 1990

The ATIS spoken language systems pilot corpus

Charles T. Hemphill; John J. Godfrey; George R. Doddington

Speech research has made tremendous progress in the past using the following paradigm:• define the research problem,• collect a corpus to objectively measure progress, and• solve the research problem.Natural language research, on the other hand, has typically progressed without the benefit of any corpus of data with which to test research hypotheses. We describe the Air Travel Information System (ATIS) pilot corpus, a corpus designed to measure progress in Spoken Language Systems that include both a speech and natural language component. This pilot marks the first full-scale attempt to collect such a corpus and provides guidelines for future efforts.


IEEE Spectrum | 1981

Computers: Speech recognition: Turning theory to practice: New ICs have brought the requisite computer power to speech technology; an evaluation of equipment shows where it stands today

George R. Doddington; Thomas B. Schalk

Presents an evaluation of the equipment now available for turning the theory of electronic speech recognition into practice. The fulfilment of this goal seems much closer than it did because of the pace of advance in IC technology.


IEEE Transactions on Speech and Audio Processing | 2001

Syllable-based large vocabulary continuous speech recognition

Aravind Ganapathiraju; Jonathan Hamaker; Joseph Picone; Mark Ordowski; George R. Doddington

Most large vocabulary continuous speech recognition (LVCSR) systems in the past decade have used a context-dependent (CD) phone as the fundamental acoustic unit. We present one of the first robust LVCSR systems that uses a syllable-level acoustic unit for LVCSR on telephone-bandwidth speech. This effort is motivated by the inherent limitations in phone-based approaches-namely the lack of an easy and efficient way for modeling long-term temporal dependencies. A syllable unit spans a longer time frame, typically three phones, thereby offering a more parsimonious framework for modeling pronunciation variation in spontaneous speech. We present encouraging results which show that a syllable-based system exceeds the performance of a comparable triphone system both in terms of word error rate (WER) and complexity. The WER of the best syllabic system reported here is 49.1% on a standard Switchboard evaluation, a small improvement over the triphone system. We also report results on a much smaller recognition task, OGI Alphadigits, which was used to validate some of the benefits syllables offer over triphones. The syllable-based system exceeds the performance of the triphone system by nearly 20%, an impressive accomplishment since the alphadigits application consists mostly of phone-level minimal pair distinctions.


Topic detection and tracking | 2002

Topic detection and tracking evaluation overview

Jonathan G. Fiscus; George R. Doddington

The objective of the Topic Detection and Tracking (TDT) program is to develop technologies that search, organize and structure multilingual, news oriented textual materials from a variety of broadcast news media. This research program uses controlled laboratory simulations of hypothetical systems to test the efficacy of potential technologies, to gauge research progress, and to provide a forum for the exchange of research information. This chapter introduces TDTs evaluation methodology including: the Linguistic Data Consortiums TDT corpora, evaluation metrics used in TDT and the five TDT research tasks: Topic Tracking, Link Detection, Topic Detection, First Story Detection, and Story Segmentation.


international conference on acoustics, speech, and signal processing | 1983

An integrated pitch tracking algorithm for speech systems

Bruce G. Secrest; George R. Doddington

A pitch tracking algorithm is described which operates in the time domain from a conditioned linear prediction residual and applies dynamic programming to optimally determine both pitch and voicing. A set of candidate pitch values are derived from a correlation function applied to an LPC prediction residual which has been low pass filtered in voiced speech and high pass filtered in unvoiced speech by using a single pole filter based on the first reflection coefficient of LPC. A post processing technique using dynamic programming is used to obtain a smooth pitch contour. By incorporating the correlation values of the candidate pitch values, voicing state information and spectral change information into the penalty function of the dynamic programming, a voicing decision is obtained along with an optimum pitch value. This integrated pitch tracking algorithm is compared to three standard pitch tracking algorithms over a data base of 58 male and female speakers ranging from 6 to 87 years of age and is shown to exhibit superior performance.


international conference on acoustics, speech, and signal processing | 1989

Speaker verification over long distance telephone lines

Jayant M. Naik; Lorin Netsch; George R. Doddington

The authors present the results of speaker-verification technology development for use over long-distance telephone lines. A description is given of two large speech databases that were collected to support the development of new speaker verification algorithms. Also discussed are the results of discriminant analysis techniques which improve the discrimination between true speakers and imposters. A comparison is made of the performance of two speaker-verification algorithms, one using template-based dynamic time warping, and the other, hidden Markov modeling.<<ETX>>


Journal of the Acoustical Society of America | 1997

Fixed text speaker verification method and apparatus

Jayant M. Naik; George R. Doddington

Speaker verification is performed by computing principal components of a fixed text statement comprising a speaker identification code and a two-word phrase, and principal spectral components of a random word phrase. A multi-phrase strategy is utilized in access control to allow successive verification attempts in a single session, if the speaker fails initial attempts. Based upon a verification attempt, the system produces a verification score which is compared with a threshold value. On successive attempts, the criterion for acceptance is changed, and one of a number of criteria must be satisfied for acceptance in subsequent attempts. A speaker normalization function can also be invoked to modify the verification score of persons enrolled with the system who inherently produce scores which result in denial of access. Accuracy of the verification system is enhanced by updating the reference template which then more accurately symbolizes the persons speech signature.


international conference on acoustics, speech, and signal processing | 1989

Phonetically sensitive discriminants for improved speech recognition

George R. Doddington

A phonetically sensitive transformation of speech features has yielded significant improvement in speech-recognition performance. This (linear) transformation of the speech feature vector is designed to discriminate against out-of-class confusion data and is a function of phonetic state. Evaluation of the technique on the TI/NBS connected digit database demonstrates word (sentence) error rates of 0.5% (1.5%) for unknown-length strings and 0.2% (0.6%) for known-length strings. These error rates are two to three times lower than the best previously reported results and suggest that significant improvements in speech-recognition system performance can be achieved by better acoustic-phonetic modeling.<<ETX>>


Journal of the Acoustical Society of America | 1994

Method for utilizing formant frequencies in speech recognition

George R. Doddington; Yeunung Chen; R. Gary Leonard

A speech recognizer which utilizes hypothesis testing to determine formant frequencies for use in speech recognition. A pre-processor (36) receives speech signal frames and utilizes linear predictive coding to generate all formant frequency candidates. An optimum formant selector (38) operates with a comparator (40) to select from the formant candidates those formants which best match stored reference formants. A dynamic time warper (42) and high level recognition logic (44) operate to determine whether or not to declare a recognized word.


Journal of the Acoustical Society of America | 1991

Very low rate speech encoder and decoder

Joseph Picone; George R. Doddington

A speech encoder is disclosed quantizing speech information with respect to energy, voicing and pitch parameters to provide a fixed number of bits per block of frames. Coding of the parameters takes place for each N frames, which comprise a block, irrespective of phonemic boundaries. Certain frames of speech information are discarded during transmission, if such information is substantially duplicated in an adjacent frame. A very low data rate transmission system is thus provided which exhibits a high degree of fidelity and throughput.

Collaboration


Dive into the George R. Doddington's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Alvin F. Martin

National Institute of Standards and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Craig S. Greenberg

National Institute of Standards and Technology

View shared research outputs
Top Co-Authors

Avatar

Panos E. Papamichalis

Southern Methodist University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mark A. Przybocki

National Institute of Standards and Technology

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge