Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Mari Ostendorf is active.

Publication


Featured researches published by Mari Ostendorf.


IEEE Transactions on Speech and Audio Processing | 1996

From HMM's to segment models: a unified view of stochastic modeling for speech recognition

Mari Ostendorf; Vassilios Digalakis; Owen Kimball

Many alternative models have been proposed to address some of the shortcomings of the hidden Markov model (HMM), which is currently the most popular approach to speech recognition. In particular, a variety of models that could be broadly classified as segment models have been described for representing a variable-length sequence of observation vectors in speech recognition applications. Since there are many aspects in common between these approaches, including the general recognition and training problems, it is useful to consider them in a unified framework. The paper describes a general stochastic model that encompasses most of the models proposed in the literature, pointing out similarities of the models in terms of correlation and parameter tying assumptions, and drawing analogies between segment models and HMMs. In addition, we summarize experimental results assessing different modeling assumptions and point out remaining open questions.


Journal of the Acoustical Society of America | 1992

Segmental durations in the vicinity of prosodic phrase boundaries

Colin W. Wightman; Stefanie Shattuck-Hufnagel; Mari Ostendorf; Patti Price

Numerous studies have indicated that prosodic phrase boundaries may be marked by a variety of acoustic phenomena including segmental lengthening. It has not been established, however, whether this lengthening is restricted to the immediate vicinity of the boundary, or if it extends over some larger region. In this study, segmental lengthening in the vicinity of prosodic boundaries is examined and found to be restricted to the rhyme of the syllable preceding the boundary. By using a normalized measure of segmental lengthening, and by compensating for differences in speaking rate, it is also shown that at least four distinct types of boundaries can be distinguished on the basis of this lengthening.


Journal of the Acoustical Society of America | 1991

The use of prosody in syntactic disambiguation

Patti Price; Mari Ostendorf; Stefanie Shattuck-Hufnagel; Cynthia Fong

Prosodic structure and syntactic structure are not identical; neither are they unrelated. Knowing when and how the two correspond could yield better quality speech synthesis, could aid in the disambiguation of competing syntactic hypotheses in speech understanding, and could lead to a more comprehensive view of human speech processing. In a set of experiments involving 35 pairs of phonetically similar sentences representing seven types of structural contrasts, the perceptual evidence shows that some, but not all, of the pairs can be disambiguated on the basis of prosodic differences. The phonological evidence relates the disambiguation primarily to boundary phenomena, although prominences sometimes play a role. Finally, phonetic analyses describing the attributes of these phonological markers indicate the importance of both absolute and relative measures.


IEEE Transactions on Speech and Audio Processing | 1994

Automatic labeling of prosodic patterns

Colin W. Wightman; Mari Ostendorf

This paper describes a general algorithm for labeling prosodic patterns in speech, which provides a mechanism for mapping sequences of observations (vectors of acoustic correlates) to prosodic labels using decision trees and a Markov sequence model. Important and novel features of the approach are that it allows many dissimilar correlates to be treated in a unified manner to provide more robust labeling, and that it is designed to be a post-word-recognition processing step. Application of the algorithm is illustrated with experimental results for labeling prosodic phrasing and phrasal prominence in two corpora of professionally read speech. The labels produced by the automatic algorithm exhibit agreement with hand-labeled prominence and phrasing that is close to the agreement between different human labelers. >


IEEE Transactions on Speech and Audio Processing | 1993

ML estimation of a stochastic linear system with the EM algorithm and its application to speech recognition

Vassilios Digalakis; Jan Robin Rohlicek; Mari Ostendorf

A nontraditional approach to the problem of estimating the parameters of a stochastic linear system is presented. The method is based on the expectation-maximization algorithm and can be considered as the continuous analog of the Baum-Welch estimation algorithm for hidden Markov models. The algorithm is used for training the parameters of a dynamical system model that is proposed for better representing the spectral dynamics of speech for recognition. It is assumed that the observed feature vectors of a phone segment are the output of a stochastic linear dynamical system, and it is shown how the evolution of the dynamics as a function of the segment length can be modeled using alternative assumptions. A phoneme classification task using the TIMIT database demonstrates that the approach is the first effective use of an explicit model for statistical dependence between frames of speech. >


meeting of the association for computational linguistics | 2005

Reading Level Assessment Using Support Vector Machines and Statistical Language Models

Sarah E. Schwarm; Mari Ostendorf

Reading proficiency is a fundamental component of language competency. However, finding topical texts at an appropriate reading level for foreign and second language learners is a challenge for teachers. This task can be addressed with natural language processing technology to assess reading level. Existing measures of reading level are not well suited to this task, but previous work and our own pilot experiments have shown the benefit of using statistical language models. In this paper, we also use support vector machines to combine features from traditional reading level measures, statistical language models, and other language processing tools to produce a better method of assessing reading level.


IEEE Transactions on Acoustics, Speech, and Signal Processing | 1989

A stochastic segment model for phoneme-based continuous speech recognition

Mari Ostendorf; Salim Roukos

The authors introduce a novel approach to modeling variable-duration phonemes, called the stochastic segment model. A phoneme X is observed as a variable-length sequence of frames, where each frame is represented by a parameter vector and the length of the sequence is random. The stochastic segment model consists of (1) a time warping of the variable-length segment X into a fixed-length segment Y called a resampled segment and (2) a joint density function of the parameters of X which in this study is a Gaussian density. The segment model represents spectra/temporal structure over the entire phoneme. The model also allows the incorporation in Y of acoustic-phonetic features derived from X, in addition to the usual spectral features that have been used in hidden Markov modeling and dynamic time warping approaches to speech recognition. The authors describe the stochastic segment model, the recognition algorithm, and an iterative training algorithm for estimating segment models from continuous speech. They present several results using segment models in two speaker-dependent recognition tasks and compare the performance of the stochastic segment model to the performance of the hidden Markov models. >


IEEE Transactions on Audio, Speech, and Language Processing | 2006

Enriching speech recognition with automatic detection of sentence boundaries and disfluencies

Yang Liu; Elizabeth Shriberg; Andreas Stolcke; Dustin Hillard; Mari Ostendorf; Mary P. Harper

Effective human and automatic processing of speech requires recovery of more than just the words. It also involves recovering phenomena such as sentence boundaries, filler words, and disfluencies, referred to as structural metadata. We describe a metadata detection system that combines information from different types of textual knowledge sources with information from a prosodic classifier. We investigate maximum entropy and conditional random field models, as well as the predominant hidden Markov model (HMM) approach, and find that discriminative models generally outperform generative models. We report system performance on both broadcast news and conversational telephone speech tasks, illustrating significant performance differences across tasks and as a function of recognizer performance. The results represent the state of the art, as assessed in the NIST RT-04F evaluation


IEEE Transactions on Speech and Audio Processing | 1995

The challenge of spoken language systems: Research directions for the nineties

Ron Cole; L. Hirschman; L. Atlas; M. Beckman; Alan W. Biermann; M. Bush; Mark A. Clements; L. Cohen; Oscar N. Garcia; B. Hanson; Hynek Hermansky; S. Levinson; Kathleen R. McKeown; Nelson Morgan; David G. Novick; Mari Ostendorf; Sharon L. Oviatt; Patti Price; Harvey F. Silverman; J. Spiitz; Alex Waibel; Cliff Weinstein; Stephen A. Zahorian; Victor W. Zue

A spoken language system combines speech recognition, natural language processing and human interface technology. It functions by recognizing the persons words, interpreting the sequence of words to obtain a meaning in terms of the application, and providing an appropriate response back to the user. Potential applications of spoken language systems range from simple tasks, such as retrieving information from an existing database (traffic reports, airline schedules), to interactive problem solving tasks involving complex planning and reasoning (travel planning, traffic routing), to support for multilingual interactions. We examine eight key areas in which basic research is needed to produce spoken language systems: (1) robust speech recognition; (2) automatic training and adaptation; (3) spontaneous speech; (4) dialogue models; (5) natural language response generation; (6) speech synthesis and speech generation; (7) multilingual systems; and (8) interactive multimodal systems. In each area, we identify key research challenges, the infrastructure needed to support research, and the expected benefits. We conclude by reviewing the need for multidisciplinary research, for development of shared corpora and related resources, for computational support and far rapid communication among researchers. The successful development of this technology will increase accessibility of computers to a wide range of users, will facilitate multinational communication and trade, and will create new research specialties and jobs in this rapidly expanding area. >


north american chapter of the association for computational linguistics | 2003

Detection of agreement vs. disagreement in meetings: training with unlabeled data

Dustin Hillard; Mari Ostendorf; Elizabeth Shriberg

To support summarization of automatically transcribed meetings, we introduce a classifier to recognize agreement or disagreement utterances, utilizing both word-based and prosodic cues. We show that hand-labeling efforts can be minimized by using unsupervised training on a large unlabeled data set combined with supervised training on a small amount of data. For ASR transcripts with over 45% WER, the system recovers nearly 80% of agree/disagree utterances with a confusion rate of only 3%.

Collaboration


Dive into the Mari Ostendorf's collaboration.

Top Co-Authors

Avatar

Ivan Bulyko

University of Washington

View shared research outputs
Top Co-Authors

Avatar

Dustin Hillard

University of Washington

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Stefanie Shattuck-Hufnagel

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Aaron Jaech

University of Washington

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge