Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Barbara Wheatley is active.

Publication


Featured researches published by Barbara Wheatley.


Journal of the Acoustical Society of America | 1995

Voice log-in using spoken name input

Joseph Picone; Barbara Wheatley

A voice log-in system is based on a persons spoken name input only, using speaker-dependent acoustic name recognition models in a performing speaker-independent name recognition. In an enrollment phase, a dual pass endpointing procedure defines both the persons full name (broad endpoints), and the component names separated by pauses (precise endpoints). An HMM (Hidden Markov Model) recognition model generator generates a corresponding HMM name recognition model modified by the insertion of additional skip transitions for the pauses between component names. In a recognition/update phase, a spoken-name speech signal is input to an HMM name recognition engine which performs speaker-independent name recognition--the modified HMM name recognition model permits the name recognition operation to accommodate pauses between component names of variable duration.


Journal of the Acoustical Society of America | 1994

Voice recognition of proper names using text-derived recognition models

Barbara Wheatley; Joseph Picone

A name recognition system (FIG. 1 )used to provide access to a database based on the voice recognition of a proper name spoken by a person who may not know the correct pronunciation of the name. During an enrollment phase (10), for each name-text entered (11) into a text database (12), text-derived recognition models (22) are created for each of a selected number of pronunciations of a name-text, with each recognition model being constructed from a respective sequence of phonetic features (15) generated by a Boltzmann machine (13). During a name recognition phase (20), the spoken input (24,25) of a name (by a person who may not know the correct pronunciation) is compared (26) with the recognition models (22) looking for a pattern match--selection of a corresponding name-text is made based on a decision rule (28).


international conference on acoustics, speech, and signal processing | 1994

An evaluation of cross-language adaptation for rapid HMM development in a new language

Barbara Wheatley; Kazuhiro Kondo; Wallace Anderson; Yeshwant K. Muthusamy

The feasibility of cross-language transfer of speech technology is of increasing concern as the demand for recognition systems in multiple languages grows. The paper presents a systematic study of the relative effectiveness of different methods for seeding and training HMMs in a new language, using transfer from English to Japanese for small vocabulary speaker independent continuous speech recognition as a test case. Effects of limited training data are also explored. The study found that cross-language adaptation produced better models than alternative approaches with relatively little effort, and that the number of speakers is more critical than the number of utterances for small training data sets.<<ETX>>


Journal of the Acoustical Society of America | 2003

Enrollment and modeling method and apparatus for robust speaker dependent speech models

Lorin Netsch; Barbara Wheatley

Speech recognition and the generation of speech recognition models is provided including the generation of unique phonotactic garbage models (15) to identify speech by, for example, English language constraints in addition to noise, silence and other non-speech models (11) and for speech recognition specific word models.


Journal of the Acoustical Society of America | 1998

Speaker-dependent speech recognition using speaker independent models

Jeffrey L. Scruggs; Barbara Wheatley; Abraham P. Ittycheriah

The memory and data management requirements for text independent speaker dependent recognition are drastically reduced by using a novel approach that eliminates the need for separate acoustic recognition models for each speaker. This is achieved by using speaker independent recognition models at the acoustic level. The speaker dependent data stored for each item to be recognized consists only of information needed to determine the speaker independent recognition model sequence for that item.


ieee automatic speech recognition and understanding workshop | 1997

Syllable-a promising recognition unit for LVCSR

Aravind Ganapathiraju; Vaibhava Goel; Joseph Picone; Andres Corrada; George R. Doddington; Katrin Kirchhoff; Mark Ordowski; Barbara Wheatley

We present an attempt to model syllable level acoustic information as a viable alternative to the conventional phone level acoustic unit for large vocabulary continuous speech recognition. The motivation for this work were the inherent limitations in the phone based approach, primarily the decompositional nature and lack of larger scale temporal dependencies. We present preliminary but encouraging results on a syllable based recognition system which exceeds the performance of a comparable triphone system both in terms of word error rate (WER) and complexity. The WER of the best syllable system reported here was 49.1% on a standard SWITCHBOARD evaluation.


international conference on acoustics, speech, and signal processing | 1992

Robust automatic time alignment of orthographic transcriptions with unconstrained speech

Barbara Wheatley; George R. Doddington; Charles T. Hemphill; John J. Godfrey; Edward Holliman; Jane McDaniel; Drew Fisher

A method for automatic time alignment of orthographically transcribed speech using supervised speaker-independent automatic speech recognition based on the orthographic transcription, an online dictionary, and HMM phone models is presented. This method successfully aligns transcriptions with speech in unconstrained 5 to 10 min conversations collected over long-distance telephone lines. It requires minimal manual processing and generally produces correct alignments despite the challenging nature of the data. The robustness and efficiency of the method make it a practical tool for very large speech corpora.<<ETX>>


Digital Signal Processing | 1991

Voice across America: Toward robust speaker-independent speech recognition for telecommunications applications

Barbara Wheatley; Joseph Picone

The demand for telecommunications applications of automatic speech recognition has exploded in recent years. This area seems a natural candidate for speech recognition systems, since it embraces a tremendous variety of applications that rely entirely on audio signals and serial interfaces. However, the telecommunications environment strains the capabilities of current technology, given its broad range of uncontrollable variables, from speaker characteristics to telephone handsets and line quality. Current recognition systems have attained impressive performance levels on relatively controlled tasks, such as speaker-independent continuous digit recognition on laboratory databases comprising a few hundred speakers [l-3]. To comprehend the additional challenges of the telecommunications environment, we must study the effects on recognition of handset and channel characteristics, speaker accent, speaking style, and lexicon, as well as the interactions among these factors. No small amount of data will suffice to model these conditions. Simultaneous with the explosion of telecommunications applications has been the introduction of powerful statistical modeling techniques, known as hidden Markov models (HMMs), to speech recognition [4,5]. These computationally intensive algorithms introduce a large number of degrees of freedom into the speech recognition problem and hence exhibit slow convergence properties. As a consequence, they require orders of magnitude more training data than the previous generation of deterministic techniques. Many databases collected in the mid-l%Os, such as the DARPA Resource Management database [6] and the TIMIT Acoustic Phonetic database [7], while ambitious programs in their own right, have proven to consistently underrepresent important dimensions in HMM recognition systems due to their limited coverage. The Voice Across America (VAA) database being collected at Texas Instruments is designed to satisfy the data requirements of this next generation of speech recognition systems. Our goal is to collect data over standard long-distance telephone lines from 100,000 speakers representing a demographically and geographically balanced sample of the contiguous United States. This database will provide the foundation for a thorough investigation of factors affecting speaker-independent continuous speech recognition for American English. Similar projects are being planned for other countries, and will form the basis for research into recognition of Japanese, British English, and European languages. As of now, we have completed two phases of the VAAproject for a total of 50,000 utterances from nearly 3700 speakers. This paper describes the methods and motivation for VAA data collection and validation procedures, the current contents of the database, and the results of exploratory research on a 1088-speaker subset of the database. Our initial results underscore the need for an extensive database: even 1088 speakers-a large database by traditional standards-are insufficient to adequately represent the many dimensions of interest. One of our purposes here is to share the insights we have gained into telephone-based data collection, in the belief that the VAA model is likely to become the standard method of collecting data over the tele-


Journal of the Acoustical Society of America | 2000

Speech recognition using clustered between word and/or phrase coarticulation

Kazuhiro Kondo; Ikuo Kudo; Yu-Hung Kao; Barbara Wheatley

Improved speech recognition is achieved according to the present invention by use of between word and/or between phrase coarticulation. The increase in the number of phonetic models required to model this additional vocabulary is reduced by clustering 19, 20 the inter-word/phrase models and grammar into only a few classes. By using one class for consonant inter-word context and two classes for vowel contexts, the accuracy for Japanese was almost as good as for unclustered models while the number of models was reduced more than half.


international conference on acoustics, speech, and signal processing | 1994

Toward vocabulary independent telephone speech recognition

Yu-Hung Kao; Charles T. Hemphill; Barbara Wheatley; Periagaram K. Rajasekaran

Vocabulary-independence of speech recognition systems has become an important issue because of the need for flexible vocabulary and the high cost of speech corpus collection. We outline the necessary steps to achieve the goal of vocabulary-independent speech recognition, and relate our experimental experience with telephone speech recognition. Two sets of experiments were conducted: (1) 34-command recognition, in which we compared vocabulary-independent (VI) and vocabulary-dependent (VD) systems as well as phonetic and word based systems, and (2) 42-city name recognition, in which our vocabulary independent recognition performance (8.5% W. Err.) was much better than the VI performance (18%) reported by the Oregon Graduate Institute (OGI) and very close to OGIs VD performance (8%). We conclude that we have made some strides toward vocabulary independence, but much remains to be done; we identify the areas of improvement that are likely to lead to the goal.<<ETX>>

Collaboration


Dive into the Barbara Wheatley's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge