Wolfgang Macherey | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Wolfgang Macherey is active.

Explore More

Publication

Featured researches published by Wolfgang Macherey.

Speech Communication | 2001

Comparison of discriminative training criteria and optimization methods for speech recognition

Ralf Schlüter; Wolfgang Macherey; Boris Müller; Hermann Ney

Abstract The aim of this work is to build up a common framework for a class of discriminative training criteria and optimization methods for continuous speech recognition. A unified discriminative criterion based on likelihood ratios of correct and competing models with optional smoothing is presented. The unified criterion leads to particular criteria through the choice of competing word sequences and the choice of smoothing. Analytic and experimental comparisons are presented for both the maximum mutual information (MMI) and the minimum classification error (MCE) criterion together with the optimization methods gradient descent (GD) and extended Baum (EB) algorithm. A tree search-based restricted recognition method using word graphs is presented, so as to reduce the computational complexity of large vocabulary discriminative training. Moreover, for MCE training, a method using word graphs for efficient calculation of discriminative statistics is introduced. Experiments were performed for continuous speech recognition using the ARPA wall street journal (WSJ) corpus with a vocabulary of 5k words and for the recognition of continuously spoken digit strings using both the TI digit string corpus for American English digits, and the SieTill corpus for telephone line recorded German digits. For the MMI criterion, neither analytical nor experimental results do indicate significant differences between EB and GD optimization. For acoustic models of low complexity, MCE training gave significantly better results than MMI training. The recognition results for large vocabulary MMI training on the WSJ corpus show a significant dependence on the context length of the language model used for training. Best results were obtained using a unigram language model for MMI training. No significant correlation has been observed between the language models chosen for training and recognition.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2004

Adaptation in statistical pattern recognition using tangent vectors

Daniel Keysers; Wolfgang Macherey; Hermann Ney; Jörg Dahmen

We integrate the tangent method into a statistical framework for classification analytically and practically. The resulting consistent framework for adaptation allows us to efficiently estimate the tangent vectors representing the variability. The framework improves classification results on two real-world pattern recognition tasks from the domains handwritten character recognition and automatic speech recognition.

cross language evaluation forum | 2005

FIRE in ImageCLEF 2005: combining content-based image retrieval with textual information retrieval

Thomas Deselaers; Tobias Weyand; Daniel Keysers; Wolfgang Macherey; Hermann Ney

In this paper the methods we used in the 2005 ImageCLEF content-based image retrieval evaluation are described. For the medical retrieval task, we combined several low-level image features with textual information retrieval. Combining these two information sources, clear improvements over the use of one of these sources alone are possible. Additionally we participated in the automatic annotation task, where our content-based image retrieval system, FIRE, was used as well as a second subimage based method for object classification. The results we achieved are very convincing. Our submissions ranked first and the third in the automatic annotation task out of a total of 44 submissions from 12 groups.

international conference on acoustics speech and signal processing | 1998

Comparison of discriminative training criteria

Ralf Schlüter; Wolfgang Macherey

A formally unifying approach for a class of discriminative training criteria including maximum mutual information (MMI) and minimum classification error (MCE) criterion is presented, together with the optimization methods of the gradient descent (GD) and extended Baum-Welch (EB) algorithm. Comparisons are discussed for the MMI and the MCE criterion, including the determination of the sets of word sequence hypotheses for discrimination using word graphs. Experiments have been carried out on the SieTill corpus for telephone line recorded German continuous digit strings. Using several approaches for acoustic modeling, the word error rates obtained by MMI training using single densities always were better than those for maximum likelihood (ML) using mixture densities. Finally, the results obtained for corrective training (CT), i.e. using only the best recognized word sequence in addition to the spoken word sequence, could not be improved by using the word graph based discriminative training.

ieee automatic speech recognition and understanding workshop | 2005

Minimum exact word error training

Georg Heigold; Wolfgang Macherey; Ralf Schlüter; Hermann Ney

In this paper we present the minimum exact word error (exactMWE) training criterion to optimise the parameters of large scale speech recognition systems. The exactMWE criterion is similar to the minimum word error (MWE) criterion, which minimises the expected word error, but uses the exact word error instead of an approximation based on time alignments as used in the MWE criterion. It is shown that the exact word error for all word sequence hypotheses can be represented on a word lattice. This can be accomplished using transducer-based methods. The result is a word lattice of slightly refined topology. The accumulated weights of each path through such a lattice then represent the exact number of word errors for the corresponding word sequence hypothesis. Using this compressed representation of the word error of all word sequences represented in the original lattice, exactMWE can be performed using the same lattice-based re-estimation process as for MWE training. First experiments on the Wall Street Journal dictation task do not show significant differences in recognition performance between exactMWE and MWE at comparable computational complexity and convergence behaviour of the training

european conference on machine learning | 2001

Learning of Variability for Invariant Statistical Pattern Recognition

Daniel Keysers; Wolfgang Macherey; Jörg Dahmen; Hermann Ney

In many applications, modelling techniques are necessary which take into account the inherent variability of given data. In this paper, we present an approach to model class specific pattern variation based on tangent distance within a statistical framework for classification. The model is an effective means to explicitly incorporate invariance with respect to transformations that do not change class-membership like e.g. small affine transformations in the case of image objects. If no prior knowledge about the type of variability is available, it is desirable to learn the model parameters from the data. The probabilistic interpretation presented here allows us to view learning of the variational derivatives in terms of a maximum likelihood estimation problem. We present experimental results from two different real-world pattern recognition tasks, namely image object recognition and automatic speech recognition. On the US Postal Service handwritten digit recognition task, learning of variability achieves results well comparable to those obtained using specific domain knowledge. On the SieTill corpus for continuously spoken telephone line recorded German digit strings the method shows a significant improvement in comparison with a common mixture density approach using a comparable amount of parameters. The probabilistic model is well-suited to be used in the field of statistical pattern recognition and can be extended to other domains like cluster analysis.

international conference on acoustics, speech, and signal processing | 2002

Towards automatic corpus preparation for a German broadcast news transcription system

Wolfgang Macherey; Hermann Ney

When setting up a speech recognition system for a new domain, a lot of manual effort is spent on corpus preparation, i.e., data acquisition, cutting and segmentation of the audio material, generation of pronunciation lexica, as well as the definition of suitable training and test sets. In this paper we describe several methods that help to automate and thus to speed up this procedure. For this purpose, we assume that only a preliminary, partially incorrect textual transcription is available. The effectivity of the proposed methods is demonstrated with the development of a transcription system for the recognition of German broadcast news.

EURASIP Journal on Advances in Signal Processing | 2003

Probabilistic Aspects in Spoken Document Retrieval

Wolfgang Macherey; Hans Jörg Viechtbauer; Hermann Ney

Accessing information in multimedia databases encompasses a wide range of applications in which spoken document retrieval (SDR) plays an important role. In SDR, a set of automatically transcribed speech documents constitutes the files for retrieval, to which a user may address a request in natural language. This paper deals with two probabilistic aspects in SDR. The first part investigates the effect of recognition errors on retrieval performance and inquires the question of why recognition errors have only a little effect on the retrieval performance. In the second part, we present a new probabilistic approach to SDR that is based on interpolations between document representations. Experiments performed on the TREC-7 and TREC-8 SDR task show comparable or even better results for the new proposed method than other advanced heuristic and probabilistic retrieval metrics.

conference of the international speech communication association | 2005