Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Klaus Ries is active.

Publication


Featured researches published by Klaus Ries.


Computational Linguistics | 2000

Dialogue act modeling for automatic tagging and recognition of conversational speech

Andreas Stolcke; Noah Coccaro; Rebecca Bates; Paul Taylor; Carol Van Ess-Dykema; Klaus Ries; Elizabeth Shriberg; Daniel Jurafsky; Rachel Martin; Marie Meteer

We describe a statistical approach for modeling dialogue acts in conversational speech, i.e., speech-act-like units such as STATEMENT, Question, BACKCHANNEL, Agreement, Disagreement, and Apology. Our model detects and predicts dialogue acts based on lexical, collocational, and prosodic cues, as well as on the discourse coherence of the dialogue act sequence. The dialogue model is based on treating the discourse structure of a conversation as a hidden Markov model and the individual dialogue acts as observations emanating from the model states. Constraints on the likely sequence of dialogue acts are modeled via a dialogue act n-gram. The statistical dialogue grammar is combined with word n-grams, decision trees, and neural networks modeling the idiosyncratic lexical and prosodic manifestations of each dialogue act. We develop a probabilistic integration of speech recognition with dialogue modeling, to improve both speech recognition and dialogue act classification accuracy. Models are trained and evaluated using a large hand-labeled database of 1,155 conversations from the Switchboard corpus of spontaneous human-to-human telephone speech. We achieved good dialogue act labeling accuracy (65 based on errorful, automatically recognized words and prosody, and 71 based on word transcripts, compared to a chance baseline accuracy of 35 and human accuracy of 84) and a small reduction in word recognition error.


Language and Speech | 1998

Can Prosody Aid the Automatic Classification of Dialog Acts in Conversational Speech

Elizabeth Shriberg; Rebecca A. Bates; Andreas Stolcke; Paul Taylor; Daniel Jurafsky; Klaus Ries; Noah Coccaro; Rachel Martin; Marie Meteer; Carol Van Ess-Dykema

Identifying whether an utterance is a statement, question, greeting, and so forth is integral to effective automatic understanding of natural dialog. Little is known, however, about how such dialog acts (DAs) can be automatically classified in truly natural conversation. This study asks whether current approaches, which use mainly word information, could be improved by adding prosodic information. The study is based on more than 1000 conversations from the Switchboard corpus. DAs were hand-annotated, and prosodic features (duration, pause, F0, energy, and speaking rate) were automatically extracted for each DA. In training, decision trees based on these features were inferred; trees were then applied to unseen test data to evaluate performance. Performance was evaluated for prosody models alone, and after combining the prosody models with word information—either from true words or from the output of an automatic speech recognizer. For an overall classification task, as well as three subtasks, prosody made significant contributions to classification. Feature-specific analyses further revealed that although canonical features (such as F0 for questions) were important, less obvious features could compensate if canonical features were removed. Finally, in each task, integrating the prosodic model with a DA-specific statistical language model improved performance over that of the language model alone, especially for the case of recognized words. Results suggest that DAs are redundantly marked in natural conversation, and that a variety of automatically extractable prosodic features could aid dialog processing in speech applications.


international conference on acoustics, speech, and signal processing | 2001

Advances in automatic meeting record creation and access

Alex Waibel; Michael Bett; Florian Metze; Klaus Ries; Thomas Schaaf; Tanja Schultz; Hagen Soltau; Hua Yu; Klaus Zechner

Oral communication is transient, but many important decisions, social contracts and fact findings are first carried out in an oral setup, documented in written form and later retrieved. At Carnegie Mellon Universitys Interactive Systems Laboratories we have been experimenting with the documentation of meetings. The paper summarizes part of the progress that we have made in this test bed, specifically on the question of automatic transcription using large vocabulary continuous speech recognition, information access using non-keyword based methods, summarization and user interfaces. The system is capable of automatically constructing a searchable and browsable audio-visual database of meetings and provide access to these records.


international conference on acoustics, speech, and signal processing | 1997

The Karlsruhe-Verbmobil speech recognition engine

Michael Finke; Petra Geutner; Hermann Hild; Thomas Kemp; Klaus Ries; Martin Westphal

Verbmobil, a German research project, aims at machine translation of spontaneous speech input. The ultimate goal is the development of a portable machine translator that will allow people to negotiate in their native language. Within this project the University of Karlsruhe has developed a speech recognition engine that has been evaluated on a yearly basis during the project and shows very promising speech recognition word accuracy results on large vocabulary spontaneous speech. We introduce the Janus Speech Recognition Toolkit underlying the speech recognizer. The main new contributions to the acoustic modeling part of our 1996 evaluation system-speaker normalization, channel normalization and polyphonic clustering-are discussed and evaluated. Besides the acoustic models we delineate the different language models used in our evaluation system: word trigram models interpolated with class based models and a separate spelling language model were applied. As a result of using the toolkit and integrating all these parts into the recognition engine the word error rate on the German spontaneous scheduling task (GSST) could be decreased from 30% word error rate in 1995 to 13.8% in 1996.


ieee automatic speech recognition and understanding workshop | 1997

Automatic detection of discourse structure for speech recognition and understanding

Daniel Jurafsky; Rebecca A. Bates; Noah Coccaro; Rachel Martin; Marie Meteer; Klaus Ries; Elizabeth Shriberg; Andreas Stolcke; Paul Taylor; C. Van Ess-Dykema

We describe a new approach for statistical modeling and detection of discourse structure for natural conversational speech. Our model is based on 42 dialog acts (DAs), (question, answer, backchannel, agreement, disagreement, apology, etc.). We labeled 1155 conversations from the Switchboard (SWBD) database (Godfrey et al., 1992) of human-to-human telephone conversations with these 42 types and trained a dialog act detector based on three distinct knowledge sources: sequences of words which characterize a dialog act; prosodic features which characterize a dialog act; and a statistical discourse grammar. Our combined detector, although still in preliminary stages, already achieves a 65% dialog act detection rate based on acoustic waveforms, and 72% accuracy based on word transcripts. Using this detector to switch among the 42 dialog-act-specific trigram LMs also gave us an encouraging but not statistically significant reduction in SWBD word error.


international conference on acoustics, speech, and signal processing | 1997

Recognition of conversational telephone speech using the JANUS speech engine

Torsten Zeppenfeld; Michael Finke; Klaus Ries; Martin Westphal; Alex Waibel

Recognition of conversational speech is one of the most challenging speech recognition tasks to-date. While recognition error rates of 10% or lower can now be reached on speech dictation tasks over vocabularies in excess of 60,000 words, recognition of conversational speech has persistently resisted most attempts at improvements by way of the proven techniques to date. Difficulties arise from shorter words, telephone channel degradation, and highly disfluent and coarticulated speech. In this paper, we describe the application, adaptation, and performance evaluation of our JANUS speech recognition engine to the Switchboard conversational speech recognition task. Through a number of algorithmic improvements, we have been able to reduce error rates from more than 50% word error to 38%, measured on the offical 1996 NIST evaluation test set. Improvements include vocal tract length normalization, polyphonic modeling, label boosting, speaker adaptation with and without confidence measures, and speaking mode dependent pronunciation modeling.


international conference on spoken language processing | 1996

Class phrase models for language modeling

Klaus Ries; Finn Dag Buø; Alex Waibel

Previous attempts to automatically determine multi-words as the basic unit for language modeling have been successful for extending bigram models to improve the perplexity of the language model and/or the word accuracy of the speech decoder. However, none of these techniques gave improvements over the trigram model so far, except for the rather controlled ATIS task (McCandless & Glass, 1994). We therefore propose an algorithm that minimizes the perplexity of a bigram model directly. The new algorithm is able to reduce the trigram perplexity and also achieves word accuracy improvements in the Verbmobil task. It is the natural counterpart of successful word classification algorithms for language modeling that minimize the leaving-one-out bigram perplexity. We also give some details on the usage of class-finding techniques and m-gram models, which can be crucial to successful applications of this technique.


international conference on acoustics speech and signal processing | 1999

HMM and neural network based speech act detection

Klaus Ries

We present an incremental lattice generation approach to speech act detection for spontaneous and overlapping speech in telephone conversations (CallHome Spanish). At each stage of the process it is therefore possible to use different models after the initial HMM models have generated a reasonable set of hypothesis. These lattices can be processed further by more complex models. This study shows how neural networks can be used very effectively in the classification of speech acts. We find that speech acts can be classified better using the neural net based approach than using the more classical ngram backoff model approach. The best resulting neural network operates only on unigrams and the integration of the ngram backoff model as a prior to the model reduces the performance of the model. The neural network can therefore more likely be robust against errors from an LVCSR system and can potentially be trained from a smaller database.


international acm sigir conference on research and development in information retrieval | 2001

Segmenting Conversations by Topic, Initiative, and Style

Klaus Ries

Topical segmentation is a basic tool for information access to audio records of meetings and other types of speech documents which may be fairly long and contain multiple topics. Standard segmentation algorithms are typically based on keywords, pitch contours or pauses. This work demonstrates that speaker initiative and style may be used as segmentation criteria as well. A probabilistic segmentation procedure is presented which allows the integration and modeling of these features in a clean framework with good results.Keyword based segmentation methods degrade significantly on our meeting database when speech recognizer transcripts are used instead of manual transcripts. Speaker initiative is an interesting feature since it delivers good segmentations and should be easy to obtain from the audio. Speech style variation at the beginning, middle and end of topics may also be exploited for topical segmentation and would not require the detection of rare keywords.


international conference on spoken language processing | 1996

JANUS-II: towards spontaneous Spanish speech recognition

Puming Zhan; Klaus Ries; Marsal Gavaldà; Donna Gates; Alon Lavie; Alex Waibel

JANUS-II is a research system for investigating various issues in speech-to-speech translations and has been implemented for translations in many languages. In this paper, we address the Spanish speech recognition part of JANUS-II. First, we report the bootstrapping and optimization of the recognition system. Then we investigate the difference between push-to-talk and cross-talk dialogs, which are two different kinds of data in our database. We give a detailed noise analysis for the push-to-talk and cross-talk dialogs and present some recognition results for comparison. We have observed that the cross-talk dialogs are harder than the push-to-talk dialogs for speech recognition, because they are more noisy than the latter. Currently, the error rate of our Spanish recognizer is 27% for the push-to-talk test set and 32% for the cross-talk test set.

Collaboration


Dive into the Klaus Ries's collaboration.

Top Co-Authors

Avatar

Alex Waibel

Karlsruhe Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Alon Lavie

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Lori S. Levin

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Michael Finke

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Noah Coccaro

University of Colorado Boulder

View shared research outputs
Top Co-Authors

Avatar

Paul Taylor

University of Edinburgh

View shared research outputs
Top Co-Authors

Avatar

Rachel Martin

Johns Hopkins University

View shared research outputs
Researchain Logo
Decentralizing Knowledge