Christian Plahl | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Christian Plahl is active.

Explore More

Publication

Featured researches published by Christian Plahl.

international conference on acoustics, speech, and signal processing | 2011

The RWTH 2010 Quaero ASR evaluation system for English, French, and German

Martin Sundermeyer; Markus Nussbaum-Thom; Simon Wiesler; Christian Plahl; A. El-Desoky Mousa; Stefan Hahn; David Nolden; Ralf Schlüter; Hermann Ney

Recognizing Broadcast Conversational (BC) speech data is a difficult task, which can be regarded as one of the major challenges beyond the recognition of Broadcast News (BN).

international conference on image processing | 2011

Hierarchical hybrid MLP/HMM or rather MLP features for a discriminatively trained Gaussian HMM: A comparison for offline handwriting recognition

Philippe Dreuw; Patrick Doetsch; Christian Plahl; Hermann Ney

We use neural network based features extracted by a hierarchical multilayer-perceptron (MLP) network either in a hybrid MLP/HMM approach or to discriminatively retrain a Gaussian hidden Markov model (GHMM) system in a tandem approach. MLP networks have been successfully used to model long-term and non-linear features dependencies in automatic speech and optical character recognition. In offline handwriting recognition, MLPs have been mostly used for isolated character and word recognition in hybrid approaches. Here we analyze MLPs within an LVCSR framework for continuous handwriting recognition using discriminative MMI/MPE training. Especially hybrid MLP/HMM and discriminatively retrained MLP-GHMM tandem approaches are evaluated. Significant improvements and competitive results are reported for a closed-vocabulary task on the IfN/ENIT Arabic handwriting database and for a large-vocabulary task using the IAM English handwriting database.

ieee automatic speech recognition and understanding workshop | 2011

Cross-lingual portability of Chinese and english neural network features for French and German LVCSR

Christian Plahl; Ralf Schlüter; Hermann Ney

This paper investigates neural network (NN) based cross-lingual probabilistic features. Earlier work reports that intra-lingual features consistently outperform the corresponding cross-lingual features. We show that this may not generalize. Depending on the complexity of the NN features, cross-lingual features reduce the resources used for training —the NN has to be trained on one language only— without any loss in performance w.r.t. word error rate (WER). To further investigate this inconsistency concerning intra- vs. cross-lingual neural network features, we analyze the performance of these features w.r.t. the degree of kinship between training and testing language, and the amount of training data used. Whenever the same amount of data is used for NN training, a close relationship between training and testing language is required to achieve similar results. By increasing the training data the relationship becomes less, as well as changing the topology of the NN to the bottle neck structure. Moreover, cross-lingual features trained on English or Chinese improve the best intra-lingual system for German up to 2% relative in WER and up to 3% relative for French and achieve the same improvement as for discriminative training. Moreover, we gain again up to 8% relative in WER by combining intra- and cross-lingual systems.

computer vision and pattern recognition | 2012

Enhanced continuous sign language recognition using PCA and neural network features

Yannick L. Gweth; Christian Plahl; Hermann Ney

In this work a Gaussian Hidden Markov Model (GHMM) based automatic sign language recognition system is built on the SIGNUM database. The system is trained on appearance-based features as well as on features derived from a multilayer perceptron (MLP). Appearance-based features are directly extracted from the original images without any colored gloves or sensors. The posterior estimates are derived from a neural network. Whereas MLP based features are well-known in speech and optical character recognition, this is the first time that these features are used in a sign language system. The MLP based features improve the word error rate (WER) of the system from 16% to 13% compared to the appearance-based features. In order to benefit from the different feature types we investigate a combination technique. The models trained on each feature set are combined during the recognition step. By means of the combination technique, we could improve the word error rate of our best system by more than 8% relative and outperform the best published results on this database by about 6% relative.

international conference on acoustics, speech, and signal processing | 2012

Improved pre-training of Deep Belief Networks using Sparse Encoding Symmetric Machines

Christian Plahl; Tara N. Sainath; Bhuvana Ramabhadran; David Nahamoo

Restricted Boltzmann Machines (RBM) continue to be a popular methodology to pre-train weights of Deep Belief Networks (DBNs). However, the RBM objective function cannot be maximized directly. Therefore, it is not clear what function to monitor when deciding to stop the training, leading to a challenge in managing the computational costs. The Sparse Encoding Symmetric Machine (SESM) has been suggested as an alternative method for pre-training. By placing a sparseness term on the NN output codebook, SESM allows the objective function to be optimized directly and reliably be monitored as an indicator to stop the training. In this paper, we explore SESM to pre-train DBNs and apply this the first time to speech recognition. First, we provide a detailed analysis comparing the behavior of SESM and RBM. Second, we compare the performance of SESM pre-trained and RBM pre-trained DBNs on TIMIT and a 50 hour English Broadcast News task. Results indicate that pre-trained DBNs using SESM and RBMs achieve comparable performance and outperform randomly initialized DBNs with SESM providing a much easier stopping criterion relative to RBM.

ieee automatic speech recognition and understanding workshop | 2007

Development of the 2007 RWTH Mandarin LVCSR system

Björn Hoffmeister; Christian Plahl; Peter Fritz; Georg Heigold; Jonas Lööf; Ralf Schlüter; Hermann Ney

This paper describes the development of the RWTH Mandarin LVCSR system. Different acoustic front-ends together with multiple system cross-adaptation are used in a two stage decoding framework. We describe the system in detail and present systematic recognition results. Especially, we compare a variety of approaches for cross-adapting to multiple systems. During the development we did a comparative study on different methods for integrating tone and phoneme posterior features. Furthermore, we apply lattice based consensus decoding and system combination methods. In these methods, the effect of minimizing character instead of word errors is compared. The final system obtains a character error rate of 17.7% on the GALE 2006 evaluation data.

international conference on acoustics, speech, and signal processing | 2013

Feature combination and stacking of recurrent and non-recurrent neural networks for LVCSR

Christian Plahl; Michael Kozielski; Ralf Schlüter; Hermann Ney

This paper investigates the combination of different short-term features and the combination of recurrent and non-recurrent neural networks (NNs) on a Spanish speech recognition task. Several methods exist to combine different feature sets such as concatenation or linear discriminant analysis (LDA). Even though all these techniques achieve reasonable improvements, feature combination by multi-layer perceptrons (MLPs) outperforms all known approaches. We develop the concept of MLP based feature combination further using recurrent neural networks (RNNs). The phoneme posterior estimates derived from an RNN lead to a significant improvement over the result of the MLPs and achieve a 5% relative better word error rate (WER) with much less parameters. Moreover, we improve the system performance further by combining an MLP and an RNN in a hierarchical framework. The MLP benefits from the preprocessing of the RNN. All NNs are trained on phonemes. Nevertheless, the same concepts could be applied using context-dependent states. In addition to the improvements in recognition performance w.r.t. WER, NN based feature combination methods reduce both, the training and the testing complexity. Overall, the systems are based on a single set of acoustic models, together with the training of different NNs.

IEEE Transactions on Audio, Speech, and Language Processing | 2011

Transcribing Mandarin Broadcast Speech Using Multi-Layer Perceptron Acoustic Features

Fabio Valente; Mathew Magimai.-Doss; Christian Plahl; Suman V. Ravuri; Wen Wang

Recently, several multi-layer perceptron (MLP)-based front-ends have been developed and used for Mandarin speech recognition, often showing significant complementary properties to conventional spectral features. Although widely used in multiple Mandarin systems, no systematic comparison of all the different approaches as well as their scalability has been proposed. The novelty of this correspondence is mainly experimental. In this work, all the MLP front-ends recently developed at multiple sites are described and compared in a systematic manner on a 100 hours setup. The study covers the two main directions along which the MLP features have evolved: the use of different input representations to the MLP and the use of more complex MLP architectures beyond the three-layer perceptron. The results are analyzed in terms of confusion matrices and the paper discusses a number of novel findings that the comparison reveals. Furthermore, the two best front-ends used in the GALE 2008 evaluation, referred as MLP1 and MLP2, are studied in a more complex LVCSR system in order to investigate their scalability in terms of the amount of training data (from 100 hours to 1600 hours) and the parametric system complexity (maximum likelihood versus discriminative training, speaker adaptative training, lattice level combination). Results on 5 hours of evaluation data from the GALE project reveal that the MLP features consistently produce improvements in the range of 15%-23% relative at the different steps of a multipass system when compared to mel-frequency cepstral coefficient (MFCC) and PLP features, suggesting that the improvements scale with the amount of data and with the complexity of the system. The integration of those features into the GALE 2008 evaluation system provide very competitive performances compared to other Mandarin systems.

2009 IEEE Workshop on Automatic Speech Recognition&Understanding | 2009

Generalized likelihood ratio discriminant analysis

Muhammad Ali Tahir; Georg Heigold; Christian Plahl; Ralf Schlüter; Hermann Ney

In the past several decades, classifier-independent front-end feature extraction, where the derivation of acoustic features is lightly associated with the back-end model training or classification, has been prominently used in various pattern recognition tasks, including automatic speech recognition (ASR). In this paper, we present a novel discriminative feature transformation, named generalized likelihood ratio discriminant analysis (GLRDA), on the basis of the likelihood ratio test (LRT). It attempts to seek a lower dimensional feature subspace by making the most confusing situation, described by the null hypothesis, as unlikely to happen as possible without the homoscedastic assumption on class distributions. We also show that the classical linear discriminant analysis (LDA) and its well-known extension - heteroscedastic linear discriminant analysis (HLDA) can be regarded as two special cases of our proposed method. The empirical class confusion information can be further incorporated into GLRDA for better recognition performance. Experimental results demonstrate that GLRDA and its variant can yield moderate performance improvements over HLDA and LDA for the large vocabulary continuous speech recognition (LVCSR) task.

conference of the international speech communication association | 2007