Patrick Lehnen
RWTH Aachen University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Patrick Lehnen.
IEEE Transactions on Audio, Speech, and Language Processing | 2011
Stefan Hahn; Marco Dinarelli; Christian Raymond; Fabrice Lefèvre; Patrick Lehnen; R. De Mori; Alessandro Moschitti; Hermann Ney; Giuseppe Riccardi
One of the first steps in building a spoken language understanding (SLU) module for dialogue systems is the extraction of flat concepts out of a given word sequence, usually provided by an automatic speech recognition (ASR) system. In this paper, six different modeling approaches are investigated to tackle the task of concept tagging. These methods include classical, well-known generative and discriminative methods like Finite State Transducers (FSTs), Statistical Machine Translation (SMT), Maximum Entropy Markov Models (MEMMs), or Support Vector Machines (SVMs) as well as techniques recently applied to natural language processing such as Conditional Random Fields (CRFs) or Dynamic Bayesian Networks (DBNs). Following a detailed description of the models, experimental and comparative results are presented on three corpora in different languages and with different complexity. The French MEDIA corpus has already been exploited during an evaluation campaign and so a direct comparison with existing benchmarks is possible. Recently collected Italian and Polish corpora are used to test the robustness and portability of the modeling approaches. For all tasks, manual transcriptions as well as ASR inputs are considered. Additionally to single systems, methods for system combination are investigated. The best performing model on all tasks is based on conditional random fields. On the MEDIA evaluation corpus, a concept error rate of 12.6% could be achieved. Here, additionally to attribute names, attribute values have been extracted using a combination of a rule-based and a statistical approach. Applying system combination using weighted ROVER with all six systems, the concept error rate (CER) drops to 12.0%.
IEEE Transactions on Audio, Speech, and Language Processing | 2011
Georg Heigold; Hermann Ney; Patrick Lehnen; Tobias Gass; Ralf Schlüter
Conventional speech recognition systems are based on hidden Markov models (HMMs) with Gaussian mixture models (GHMMs). Discriminative log-linear models are an alternative modeling approach and have been investigated recently in speech recognition. GHMMs are directed models with constraints, e.g., positivity of variances and normalization of conditional probabilities, while log-linear models do not use such constraints. This paper compares the posterior form of typical generative models related to speech recognition with their log-linear model counterparts. The key result will be the derivation of the equivalence of these two different approaches under weak assumptions. In particular, we study Gaussian mixture models, part-of-speech bigram tagging models, and eventually, the GHMMs. This result unifies two important but fundamentally different modeling paradigms in speech recognition on the functional level. Furthermore, this paper will present comparative experimental results for various speech tasks of different complexity, including a digit string and large-vocabulary continuous speech recognition tasks.
international conference on acoustics, speech, and signal processing | 2010
Georg Heigold; Simon Wiesler; Markus Nussbaum-Thom; Patrick Lehnen; R. Schluter; Hermann Ney
Recently, there have been many papers studying discriminative acoustic modeling techniques like conditional random fields or discriminative training of conventional Gaussian HMMs. This paper will give an overview of the recent work and progress. We will strictly distinguish between the type of acoustic models on the one hand and the training criterion on the other hand. We will address two issues in more detail: the relation between conventional Gaussian HMMs and conditional random fields and the advantages of formulating the training criterion as a convex optimization problem. Experimental results for various speech tasks will be presented to carefully evaluate the different concepts and approaches, including both a digit string and large vocabulary continuous speech recognition tasks.
international conference on acoustics, speech, and signal processing | 2011
Stefan Hahn; Patrick Lehnen; Hermann Ney
Conditional Random Fields (CRFs) have proven to perform well on natural language processing tasks like name transliteration, concept tagging or grapheme-to-phoneme (g2p) conversion. The aim of this paper is to propose some extension to the state-of-the-art CRF systems for these tasks. Since the number of features can grow rapidly, a method for features selection is very helpful to boost performance. A combination of L1 and L2 regularization (elastic net) has been adopted and implemented within the Rprop optimization algorithm. Usually, dependencies on the target side are limited to bigram dependencies since the computational complexity grows exponentially with the history length. We present a modified CRF decoding where a conventional language model on target side is integrated into the CRF search process. Thus, larger contexts can be taken into account. Besides these two main parts, the already published margin-extension to the CRF training criterion has been adopted.
international conference on acoustics, speech, and signal processing | 2011
Georg Heigold; Stefan Hahn; Patrick Lehnen; Hermann Ney
We have recently proposed an EM-style algorithm to optimize log-linear models with hidden variables. In this paper, we use this algorithm to optimize a hidden conditional random field, i.e., a conditional random field with hidden variables. Similar to hidden Markov models, the alignments are the hidden variables in the examples considered. Here, EM-style algorithms are iterative optimization algorithms which are guaranteed to improve the training criterion in each iteration - without the need for tuning step sizes, sophisticated update schemes or numerical line optimization (with hardly predictable complexity). This is a rather strong property which conventional gradient-based optimization algorithms do not have. We present experimental results for a grapheme-to-phoneme conversion task and compare the convergence behavior of the EM-style algorithm with L-BFGS and Rprop.
international conference on acoustics, speech, and signal processing | 2011
Patrick Lehnen; Stefan Hahn; Andreas Guta; Hermann Ney
Conditional Random Fields (CRFs) are a state-of-the-art approach to natural language processing tasks like grapheme-to-phoneme (g2p) conversion which is used to produce pronunciations or pronunciation variants for almost all ASR pronunciation lexica. One drawback of CRFs is that for training, an alignment is needed between graphemes and phonemes, usually even 1-to-1. The quality of the g2p result heavily depends on this alignment. Since these alignments are usually not annotated within the corpora, external models have to be used to produce such an alignment in a preprocessing step. In this work, we propose two approaches to integrate the alignment generation directly and efficiently into the CRF training process. Whereas the first approach relies on linear segmentation as starting point, the second approach considers all possible alignments given certain constraints. Both methods have been evaluated on two English g2p tasks, namely NETtalk and Celex, on which state-of-the-art results have been reported in the literature. The proposed approaches lead to results comparable to the state-of-the art.
north american chapter of the association for computational linguistics | 2015
Joern Wuebker; Sebastian Muehr; Patrick Lehnen; Stephan Peitz; Hermann Ney
This work presents a flexible and efficient discriminative training approach for statistical machine translation. We propose to use the RPROP algorithm for optimizing a maximum expected BLEU objective and experimentally compare it to several other updating schemes. It proves to be more efficient and effective than the previously proposed growth transformation technique and also yields better results than stochastic gradient descent and AdaGrad. We also report strong empirical results on two large scale tasks, namely BOLT Chinese!English and WMT German!English, where our final systems outperform results reported by Setiawan and Zhou (2013) and on matrix.statmt.org. On the WMT task, discriminative training is performed on the full training data of 4M sentence pairs, which is unsurpassed in the literature.
Archive | 2017
Patrick Lehnen; Hermann Ney; Franccois Yvon
Maximum entropy approaches for sequences tagging and conditional random fields in particular have shown high potential in a variety of tasks. The effectiveness of these approaches is verified within this thesis using semantic tagging within natural language understanding as an example. Within this task, decent feature engineering and a tuning of the regularization parameter is sufficient to let conditional random fields be superior to a broad set of competing approaches including support vector machines, phrase-based translation, maximum entropy Markov models, dynamic Bayesian networks, and generatively trained probabilistic finite state transducers. Applying conditional random fields to other tasks in many cases calls for extensions to the original notation. For a multi-level semantic tagging in natural language understanding, constrained search is needed, whereas for grapheme-to-phoneme conversion, the support for a hidden segmentation and huge feature sets is required, and for statistical machine translation a solution for the large input and output vocabulary, even larger feature sets, and the hidden alignments have to be found. This thesis presents solutions to all these constraints. The conditional random fields are modeled with finite state transducers to support constraints on the search space. They are extended with hidden segmentation, elastic-net regularization, sparse-forward-backward, pruning in training, and intermediate classes in the output layer. Finally, we will combine all extensions to support statistical machine translation with conditional random fields. The best implementation for statistical machine translation is then based on a refined maximum expected Bleu objective using a similar feature notation and the same RPROP parameter estimation. It differs in a more efficient use of the phrase-based or hierarchical baseline with the help of n-best lists.
language resources and evaluation | 2008
Stefan Hahn; Patrick Lehnen; Christian Raymond; Hermann Ney
conference of the international speech communication association | 2008
Georg Heigold; Patrick Lehnen; Ralf Schlüter; Hermann Ney