Hermann Ney | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hermann Ney is active.

Explore More

Publication

Featured researches published by Hermann Ney.

Computational Linguistics | 2003

A systematic comparison of various statistical alignment models

Franz Josef Och; Hermann Ney

We present and compare various methods for computing word alignments using statistical or heuristic models. We consider the five alignment models presented in Brown, Della Pietra, Della Pietra, and Mercer (1993), the hidden Markov alignment model, smoothing techniques, and refinements. These statistical models are compared with two heuristic models based on the Dice coefficient. We present different methods for combining word alignments to perform a symmetrization of directed statistical alignment models. As evaluation criterion, we use the quality of the resulting Viterbi alignment compared to a manually produced reference alignment. We evaluate the models on the German-English Verbmobil task and the French-English Hansards task. We perform a detailed analysis of various design decisions of our statistical alignment system and evaluate these on training corpora of various sizes. An important result is that refined alignment models with a first-order dependence and a fertility model yield significantly better results than simple heuristic models. In the Appendix, we present an efficient training algorithm for the alignment models presented.

international conference on acoustics, speech, and signal processing | 1995

Improved backing-off for M-gram language modeling

Reinhard Kneser; Hermann Ney

In stochastic language modeling, backing-off is a widely used method to cope with the sparse data problem. In case of unseen events this method backs off to a less specific distribution. In this paper we propose to use distributions which are especially optimized for the task of backing-off. Two different theoretical derivations lead to distributions which are quite different from the probability distributions that are usually used for backing-off. Experiments show an improvement of about 10% in terms of perplexity and 5% in terms of word error rate.

meeting of the association for computational linguistics | 2002

Discriminative Training and Maximum Entropy Models for Statistical Machine Translation

Franz Josef Och; Hermann Ney

We present a framework for statistical machine translation of natural languages based on direct maximum entropy models, which contains the widely used source-channel approach as a special case. All knowledge sources are treated as feature functions, which depend on the source language sentence, the target language sentence and possible hidden variables. This approach allows a baseline machine translation system to be extended easily by adding new feature functions. We show that a baseline statistical machine translation system is significantly improved using this approach.

meeting of the association for computational linguistics | 2000

Improved statistical alignment models

Franz Josef Och; Hermann Ney

In this paper, we present and compare various single-word based alignment models for statistical machine translation. We discuss the five IBM alignment models, the Hidden-Markov alignment model, smoothing techniques and various modifications. We present different methods to combine alignments. As evaluation criterion we use the quality of the resulting Viterbi alignment compared to a manually produced reference alignment. We show that models with a first-order dependence and a fertility model lead to significantly better results than the simple models IBM-1 or IBM-2, which are not able to go beyond zero-order dependencies.

Computational Linguistics | 2004

The Alignment Template Approach to Statistical Machine Translation

Franz Josef Och; Hermann Ney

A phrase-based statistical machine translation approach the alignment template approach is described. This translation approach allows for general many-to-many relations between words. Thereby, the context of words is taken into account in the translation model, and local changes in word order from source to target language can be learned explicitly. The model is described using a log-linear modeling approach, which is a generalization of the often used source-channel approach. Thereby, the model is easier to extend than classical statistical machine translation systems. We describe in detail the process for learning phrasal translations, the feature functions used, and the search algorithm. The evaluation of this approach is performed on three different tasks. For the German-English speech Verbmobil task, we analyze the effect of various system components. On the French-English Canadian Hansards task, the alignment template system obtains significantly better results than a single-word-based translation model. In the Chinese-English 2002 National Institute of Standards and Technology (NIST) machine translation evaluation it yields statistically significantly better NIST scores than all competing research and commercial translation systems.

international conference on computational linguistics | 1996

HMM-based word alignment in statistical translation

Stephan Vogel; Hermann Ney; Christoph Tillmann

In this paper, we describe a new model for word alignment in statistical translation and present experimental results. The idea of the model is to make the alignment probabilities dependent on the differences in the alignment positions rather than on the absolute positions. To achieve this goal, the approach uses a first-order Hidden Markov model (HMM) for the word alignment problem as they are used successfully in speech recognition for the time alignment problem. The difference to the time alignment HMM is that there is no monotony constraint for the possible word orderings. We describe the details of the model and test the model on several bilingual corpora.

Information Retrieval | 2008

Features for image retrieval: an experimental comparison

Thomas Deselaers; Daniel Keysers; Hermann Ney

An experimental comparison of a large number of different image descriptors for content-based image retrieval is presented. Many of the papers describing new techniques and descriptors for content-based image retrieval describe their newly proposed methods as most appropriate without giving an in-depth comparison with all methods that were proposed earlier. In this paper, we first give an overview of a large variety of features for content-based image retrieval and compare them quantitatively on four different tasks: stock photo retrieval, personal photo collection retrieval, building retrieval, and medical image retrieval. For the experiments, five different, publicly available image databases are used and the retrieval performance of the features is analyzed in detail. This allows for a direct comparison of all features considered in this work and furthermore will allow a comparison of newly proposed features to these in the future. Additionally, the correlation of the features is analyzed, which opens the way for a simple and intuitive method to find an initial set of suitable features for a new task. The article concludes with recommendations which features perform well for what type of data. Interestingly, the often used, but very simple, color histogram performs well in the comparison and thus can be recommended as a simple baseline for many applications.

Speech Communication | 2008

Joint-sequence models for grapheme-to-phoneme conversion

Maximilian Bisani; Hermann Ney

Grapheme-to-phoneme conversion is the task of finding the pronunciation of a word given its written form. It has important applications in text-to-speech and speech recognition. Joint-sequence models are a simple and theoretically stringent probabilistic framework that is applicable to this problem. This article provides a self-contained and detailed description of this method. We present a novel estimation algorithm and demonstrate high accuracy on a variety of databases. Moreover, we study the impact of the maximum approximation in training and transcription, the interaction of model size parameters, n-best list generation, confidence measures, and phoneme-to-grapheme conversion. Our software implementation of the method proposed in this work is available under an Open Source license.

Lecture Notes in Computer Science | 2002

Phrase-Based Statistical Machine Translation

Richard Zens; Franz Josef Och; Hermann Ney

This paper is based on the work carried out in the framework of the VERBMOBIL project, which is a limited-domain speech translation task (German-English). In the final evaluation, the statistical approach was found to perform best among five competing approaches.In this paper, we will further investigate the used statistical translation models. A shortcoming of the single-word based model is that it does not take contextual information into account for the translation decisions. We will present a translation model that is based on bilingual phrases to explicitly model the local context. We will show that this model performs better than the single-word based model. We will compare monotone and non-monotone search for this model and we will investigate the benefit of using the sum criterion instead of the maximum approximation.

IEEE Transactions on Acoustics, Speech, and Signal Processing | 1984

The use of a one-stage dynamic programming algorithm for connected word recognition

Hermann Ney

This paper is of tutorial nature and describes a one-stage dynamic programming algorithm for file problem of connected word recognition. The algorithm to be developed is essentially identical to one presented by Vintsyuk [1] and later by Bridle and Brown [2] ; but the notation and the presentation have been clarified. The derivation used for optimally time synchronizing a test pattern, consisting of a sequence of connected words, is straightforward and simple in comparison with other approaches decomposing the pattern matching problem into several levels. The approach presented relies basically on parameterizing the time warping path by a single index and on exploiting certain path constraints both in the word interior and at the word boundaries. The resulting algorithm turns out to be significantly more efficient than those proposed by Sakoe [3] as well as Myers and Rabiner [4], while providing the same accuracy in estimating the best possible matching string. Its most important feature is that the computational expenditure per word is independent of the number of words in the input string. Thus, it is well suited for recognizing comparatively long word sequences and for real-time operation. Furthermore, there is no need to specify the maximum number of words in the input string. The practical implementation of the algorithm is discussed; it requires no heuristic rules and no overhead. The algorithm can be modified to deal with syntactic constraints in terms of a finite state syntax.

Explore More