Martin Sundermeyer
RWTH Aachen University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Martin Sundermeyer.
IEEE Transactions on Audio, Speech, and Language Processing | 2015
Martin Sundermeyer; Hermann Ney; Ralf Schlüter
Language models have traditionally been estimated based on relative frequencies, using count statistics that can be extracted from huge amounts of text data. More recently, it has been found that neural networks are particularly powerful at estimating probability distributions over word sequences, giving substantial improvements over state-of-the-art count models. However, the performance of neural network language models strongly depends on their architectural structure. This paper compares count models to feedforward, recurrent, and long short-term memory (LSTM) neural network variants on two large-vocabulary speech recognition tasks. We evaluate the models in terms of perplexity and word error rate, experimentally validating the strong correlation of the two quantities, which we find to hold regardless of the underlying type of the language model. Furthermore, neural networks incur an increased computational complexity compared to count models, and they differently model context dependences, often exceeding the number of words that are taken into account by count based approaches. These differences require efficient search methods for neural networks, and we analyze the potential improvements that can be obtained when applying advanced algorithms to the rescoring of word lattices on large-scale setups.
empirical methods in natural language processing | 2014
Martin Sundermeyer; Tamer Alkhouli; Joern Wuebker; Hermann Ney
This work presents two different translation models using recurrent neural networks. The first one is a word-based approach using word alignments. Second, we present phrase-based translation models that are more consistent with phrasebased decoding. Moreover, we introduce bidirectional recurrent neural models to the problem of machine translation, allowing us to use the full source sentence in our models, which is also of theoretical interest. We demonstrate that our translation models are capable of improving strong baselines already including recurrent neural language models on three tasks: IWSLT 2013 German!English, BOLT Arabic!English and Chinese!English. We obtain gains up to 1.6% BLEU and 1.7% TER by rescoring 1000-best lists.
international conference on acoustics, speech, and signal processing | 2013
Martin Sundermeyer; Ilya Oparin; Jean-Luc Gauvain; B. Freiberg; Ralf Schlüter; Hermann Ney
Research on language modeling for speech recognition has increasingly focused on the application of neural networks. Two competing concepts have been developed: On the one hand, feedforward neural networks representing an n-gram approach, on the other hand recurrent neural networks that may learn context dependencies spanning more than a fixed number of predecessor words. To the best of our knowledge, no comparison has been carried out between feedforward and state-of-the-art recurrent networks when applied to speech recognition. This paper analyzes this aspect in detail on a well-tuned French speech recognition task. In addition, we propose a simple and efficient method to normalize language model probabilities across different vocabularies, and we show how to speed up training of recurrent neural networks by parallelization.
international conference on acoustics, speech, and signal processing | 2011
Martin Sundermeyer; Markus Nussbaum-Thom; Simon Wiesler; Christian Plahl; A. El-Desoky Mousa; Stefan Hahn; David Nolden; Ralf Schlüter; Hermann Ney
Recognizing Broadcast Conversational (BC) speech data is a difficult task, which can be regarded as one of the major challenges beyond the recognition of Broadcast News (BN).
international conference on acoustics, speech, and signal processing | 2012
Ilya Oparin; Martin Sundermeyer; Hermann Ney; Jean-Luc Gauvain
Neural Network language models (NNLMs) have recently become an important complement to conventional n-gram language models (LMs) in speech-to-text systems. However, little is known about the behavior of NNLMs. The analysis presented in this paper aims to understand which types of events are better modeled by NNLMs as compared to n-gram LMs, in what cases improvements are most substantial and why this is the case. Such an analysis is important to take further benefit from NNLMs used in combination with conventional n-gram models. The analysis is carried out for different types of neural network (feed-forward and recurrent) LMs. The results showing for which type of events NNLMs provide better probability estimates are validated on two setups that are different in their size and the degree of data homogeneity.
adaptive multimedia retrieval | 2009
Marc Wichterich; Christian Beecks; Martin Sundermeyer; Thomas Seidl
Expanding on our preliminary work [1], we present a novel method to heuristically adapt the Earth Movers Distance to relevance feedback. Moreover, we detail an optimization-based method that takes feedback from the current and past Relevance Feedback iterations into account in order to improve the degree to which the Earth Movers Distance reflects the preference information given by the user. As shown by our experiments, the adaptation of the Earth Movers Distance results in a larger number of relevant objects in fewer feedback iterations compared to existing query movement techniques for the Earth Movers Distance.
international conference on acoustics, speech, and signal processing | 2012
Jesús Andrés-Ferrer; Martin Sundermeyer; Hermann Ney
The smoothing of n-gram models is a core technique in language modelling (LM). Modified Kneser-Ney (mKN) ranges among one of the best smoothing techniques. This technique discounts a fixed quantity from the observed counts in order to approximate the Turing-Good (TG) counts. Despite the TG counts optimise the leaving-one-out (L1O) criterion, the discounting parameters introduced in mKN do not. Moreover, the approximation to the TG counts for large counts is heavily simplified. In this work, both ideas are addressed: the estimation of the discounting parameters by L1O and better functional forms to approximate larger TG counts. The L1O performance is compared with cross-validation (CV) and mKN baseline in two large vocabulary tasks.
conference of the international speech communication association | 2012
Martin Sundermeyer; Ralf Schlüter; Hermann Ney
conference of the international speech communication association | 2012
Zoltán Tüske; Ralf Schlüter; Hermann Ney; Martin Sundermeyer
IWSLT | 2011
Lori Lamel; Sandrine Courcinous; Julien Despres; Jean-Luc Gauvain; Yvan Josse; Kevin Kilgour; Florian Kraft; Viet Bac Le; Hermann Ney; Markus Nußbaum-Thom; Ilya Oparin; Tim Schlippe; Ralf Schlüter; Tanja Schultz; Thiago Fraga-Silva; Sebastian Stüker; Martin Sundermeyer; Bianca Vieru; Ngoc Thang Vu; Alex Waibel