Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Janet M. Baker is active.

Publication


Featured researches published by Janet M. Baker.


human language technology | 1992

The design for the wall street journal-based CSR corpus

Douglas B. Paul; Janet M. Baker

The DARPA Spoken Language System (SLS) community has long taken a leadership position in designing, implementing, and globally distributing significant speech corpora widely used for advancing speech recognition research. The Wall Street Journal (WSJ) CSR Corpus described here is the newest addition to this valuable set of resources. In contrast to previous corpora, the WSJ corpus will provide DARPA its first general-purpose English, large vocabulary, natural language, high perplexity, corpus containing significant quantities of both speech data (400 hrs.) and text data (47M words), thereby providing a means to integrate speech recognition and natural language processing in application domains with high potential practical value. This paper presents the motivating goals, acoustic data design, text processing steps, lexicons, and testing paradigms incorporated into the multi-faceted WSJ CSR Corpus.


Journal of the Acoustical Society of America | 1992

Method for representing word models for use in speech recognition

Laurence S. Gillick; Dean Sturtevant; Robert Roth; James K. Baker; Janet M. Baker

A method is provided for deriving acoustic word representations for use in speech recognition. Initial word models are created, each formed of a sequence of acoustic sub-models. The acoustic sub-models from a plurality of word models are clustered, so as to group acoustically similar sub-models from different words, using, for example, the Kullback-Leibler information as a metric of similarity. Then each word is represented by cluster spelling representing the clusters into which its acoustic sub-models were placed by the clustering. Speech recognition is performed by comparing sequences of frames from speech to be recognized against sequences of acoustic models associated with the clusters of the cluster spelling of individual word models. The invention also provides a method for deriving a word representation which involves receiving a first set of frame sequences for a word, using dynamic programming to derive a corresponding initial sequence of probabilistic acoustic sub-models for the word independently of any previously derived acoustic model particular to the word, using dynamic programming to time align each of a second set of frame sequences for the word into a succession of new sub-sequences corresponding to the initial sequence of models, and using these new sub-sequences to calculate new probabilistic sub-models.


Journal of the Acoustical Society of America | 1987

Parallel pattern verifier with dynamic time warping

James K. Baker; Janet M. Baker

A speech recognition system is disclosed which employs a network of elementary local decision modules for matching an observed time-varying speech pattern against all possible time warpings of the stored prototype patterns. For each elementary speech segment, an elementary recognizer provides a score indicating the degree of correlation of the input speech segment with stored spectral patterns. Each local decision module receives the results of the elementary recognizer and, at the same time, receives an input from selected ones of the other local decision modules. Each local decision module specializes in a particular node in the network wherein each node matches the probability of how well the input segment of speech matches the particular sound segments in the sounds of the words spoken. Each local decision module takes the prior decisions of all preceding sound segments which are input from the other local decision modules and makes a selection of the locally optimum time warping to be permitted. By this selection technique, each speech segment is stretched or compressed by an arbitrary, nonlinear function based on the control of the interconnections of the other local decision modules to a particular local decision module. Each local decision module includes an accumulator memory which stores the logarithmic probabilities of the current observation which is conditional upon the internal event specified by a word to be matched or identifier of the particular pattern that corresponds to the subject node for that particular pattern. For each observation, these probabilities are computed and loaded into the accumulator memory of all the modules and, the result of the locally optimum time warping representing the accumulated score or network path to a node for the word with the highest probability is chosen.


international conference on acoustics, speech, and signal processing | 1993

Application of large vocabulary continuous speech recognition to topic and speaker identification using telephone speech

Larry Gillick; Janet M. Baker; John S. Bridle; Melvyn J. Hunt; Yoshiko Ito; S. Lowe; Jeremy Orloff; Barbara Peskin; R. Roth; F. Scattone

The authors describe a novel approach to the problems of topic and speaker identification that makes use of large-vocabulary continuous speech recognition. A theoretical framework for dealing with these problems in a symmetric way is provided. Some empirical results on topic and speaker identification that have been obtained on the extensive Switchboard corpus of telephone conversations are presented.<<ETX>>


international conference on acoustics, speech, and signal processing | 1991

On the interaction between true source, training, and testing language models

Douglas B. Paul; J. K. Baker; Janet M. Baker

An interaction has been found between the true source language model, the training language model, and the testing language model. This interaction has implications for vocabulary independent modeling, testing methodologies, discriminative training, and the adequacy of many of the current databases for continuous speech recognition development.<<ETX>>


international conference on acoustics, speech, and signal processing | 1993

Large vocabulary continuous speech recognition of Wall Street Journal data

R. Roth; Janet M. Baker; Larry Gillick; Melvyn J. Hunt; Yoshiko Ito; S. Lowe; Jeremy Orloff; Barbara Peskin; F. Scattone

The authors report on the progress that has been made at Dragon Systems in speaker-independent large-vocabulary speech recognition using speech from DARPAs Wall Street Journal corpus. First they present an overview of the recognition and training algorithms. Then, they describe experiments involving two improvements to these algorithms, moving to higher-dimensional streams and using an IMELDA transformation. They also present some results showing the reduction in error rates.<<ETX>>


Journal of the Acoustical Society of America | 1981

Issues and answers in evaluation of automatic speech recognizers

Janet M. Baker

An extensive comparison of commercially available automatic speech recognition systems revealed a wide range of performance characteristics. On an isolated word, speaker dependent database, the percentage of errors (i.e., 100% − correct recognition%) ranged over two orders of magnitude for the 10 210 utterances tested. The vocabulary consisted of 11 words (the 10 digits plus “oh”) collected from five males and five females in a moderate office‐noise environment. Simultaneous recordings were made with headset and table microphones. Every attempt was made to assure that the systems tested were operating optimally, often with vendor cooperation. Although all systems performed better with data recorded from the headset microphone than with the table microphone, the amount of degradation was not uniform across systems. Special problems and solutions for performing these tests will be discussed.


international conference on acoustics, speech, and signal processing | 1984

Cost-effective speech processing

Janet M. Baker; R. Roth; Paul G. Bamberg

Recent developments have made it possible to implement high performance speech recognition with much less computation than traditional techniques, thereby enabling real-time computation on standard microprocessors. Concepts such as time-domain acoustic-phonetic speech signal- processing as well as efficient adaptations of hidden Markov models can provide this type of capability. The Mark II system provides both speaker dependent and multiple speaker recognition of up to a 32 isolated word active vocabulary in real-time on a 2 MHz 6502, with no custom hardware, except an inexpensive microphone, pre-amp, and 8-bit A/D converter. On an initial test of 5120 test utterances (Texas Instruments isolated word data base, Spectrum, Sept., 1981), the Mark II achieved an error rate of only 0.67% (34 errors).


human language technology | 1993

Research in large vocabulary continuous speech recognition

Janet M. Baker; Larry Gillick; Robert Roth

The primary long term goal of speech research at Dragon Systems is to develop algorithms that are capable of achieving very high performance large vocabulary continuous speech recognition. At the same time, in the long run we are also concerned to keep the demands of those algorithms for computational power and memory as modest as possible, so that the results of our research can be incorporated into products that will run on moderately priced personal computers.


Journal of the Acoustical Society of America | 1972

More Visible Speech

Janet M. Baker; James K. Baker; Jerome Y. Lettvin

This paper presents a method for visual display of acoustical waveforms with which different phonemes are more readily distinguished than with spectrograms. The visual display is based on the interval between successive up crossings of the zero axis in the waveform. Although zero crossings and tip crossings have been used by several investigators since Licklinders studies demonstrating that zero‐crossing information is sufficient for intelligibility of speech, most investigators have ignored an essential property of up‐crossing analysis. Analysis of up crossings is a time domain technique and as such, allows a perfect resolution of events in time, a resolution which is lost if the up‐crossing data is averaged over intervals of time. Such precise time resolution is critical for the recognition of certain distinguishing features in various consonants. These features permit a visual display in which the phonemes can easily be distinguished, even in connected speech. Furthermore, the important features for d...

Collaboration


Dive into the Janet M. Baker's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Douglas B. Paul

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Barbara Peskin

University of California

View shared research outputs
Top Co-Authors

Avatar

Lori Lamel

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Alan R. Gibson

St. Joseph's Hospital and Medical Center

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge