Dietrich Klakow | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dietrich Klakow is active.

Explore More

Publication

Featured researches published by Dietrich Klakow.

Speech Communication | 2002

Testing the correlation of word error rate and perplexity

Dietrich Klakow; Jochen Peters

Many groups have investigated the relationship of word error rate and perplexity of language models. This issue is of central interest because perplexity optimization can be done independent of a recognizer and in most cases it is possible to find simple perplexity optimization procedures. Moreover, many tasks in language model training such as the optimization of word classes may use perplexity as target function resulting in explicit optimization formulas which are not available if error rates are used as target. This paper first presents some theoretical arguments for a close relationship between perplexity and word error rate. Thereafter the notion of uncertainty of a measurement is introduced and is then used to test the hypothesis that word error rate and perplexity are correlated by a power law. There is no evidence to reject this hypothesis.

international conference on acoustics, speech, and signal processing | 2000

Selecting articles from the language model training corpus

Dietrich Klakow

The paper suggests the use of a log-likelihood based criterion to select articles from a training corpus that are suitable to reduce perplexity on a specific task defined by a small target corpus. This method is not only efficient as an adaptation technique reducing perplexity by 32% and OOV rate from 4.2% to 2.7% but also as a pruning technique, decreasing the language model size by a factor of 3 at the same time.

Speech Communication | 2002

Large vocabulary continuous speech recognition of Broadcast News - The Philips/RWTH approach

Peter Beyerlein; Xavier L. Aubert; Matthew Harris; Dietrich Klakow; Andreas Wendemuth; Sirko Molau; Hermann Ney; Michael Pitz; Achim Sixtus

Abstract Automatic speech recognition of real-live broadcast news (BN) data (Hub-4) has become a challenging research topic in recent years. This paper summarizes our key efforts to build a large vocabulary continuous speech recognition system for the heterogenous BN task without inducing undesired complexity and computational resources. These key efforts included: • automatic segmentation of the audio signal into speech utterances; • efficient one-pass trigram decoding using look-ahead techniques; • optimal log-linear interpolation of a variety of acoustic and language models using discriminative model combination (DMC); • handling short-range and weak longer-range correlations in natural speech and language by the use of phrases and of distance-language models; • improving the acoustic modeling by a robust feature extraction, channel normalization, adaptation techniques as well as automatic script selection and verification. nThe starting point of the system development was the Philips 64k-NAB word-internal triphone trigram system. On the speaker-independent but microphone-dependent NAB-task (transcription of read newspaper texts) we obtained a word error rate of about 10%. Now, at the conclusion of the system development, we have arrived at Philips at an DMC-interpolated phrase-based crossword-pentaphone 4-gram system. This system transcribes BN data with an overall word error rate of about 17%.

international conference on acoustics speech and signal processing | 1998

Language-model optimization by mapping of corpora

Dietrich Klakow

It is questionable whether words are really the best basic units for the estimation of stochastic language models-grouping frequent word sequences to phrases can improve language models. More generally, we have investigated various coding schemes for a corpus. In this paper, it is applied to optimize the perplexity of n-gram language models. In tests on two large corpora (WSJ and BNA) the bigram perplexity was reduced by up to 29%. Furthermore, this approach allows to tackle the problem of an open vocabulary with no unknown word.

international conference on pattern recognition | 2002

Robustness of linear discriminant analysis in automatic speech recognition

Marcel Katz; Hans-Günter Meier; Hans J. G. A. Dolfing; Dietrich Klakow

Focuses on the problem of a robust estimation of different transformation matrices based on linear discriminant analysis (LDA) as it is used in automatic speech recognition systems. We investigate the effect of class distributions with artificial features and compare the resulting Fisher criterion. The paper shows that it is not very helpful to use only the Fisher criterion for an assessment of class separability. Furthermore we address the problem of dealing with too many additional dimensions in the estimation. Special experiments performed on subsets of the Wall Street Journal database (WSJ) indicate that a minimum of about 2000 feature vectors per class is needed for robust estimations with monophones. Finally we make a prediction to future experiments on the LDA matrix estimation with more classes.

Archive | 2000

Capturing Long Range Correlations Using Log-Linear Language Models

Jochen Peters; Dietrich Klakow

Written and spoken texts show long range correlations which are valuable for speech recognition systems. Unfortunately, these dependencies cannot be properly described by the widespread backing-off language models (LMs). This paper introduces basic concepts to exploit long ranging correlations for the task of language modeling. Several approaches to get suitable LM structures are discussed and compared. The theoretical findings are fully confirmed by experiments performed on the spontaneous speech from the Verbmobil II domain and on written text from the Wallstreet Journal corpus.

Archive | 2004

Three Issues in Modern Language Modeling

Dietrich Klakow

In this paper we discuss three issues in modern language modeling. The first one is the question of a quality measure for language models, the second is language model smoothing and the third is the question of how to build good long-range language models. In all three cases some results are given indicating possible directions of further research.

Journal of the Acoustical Society of America | 1998