Simon Wiesler | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Simon Wiesler is active.

Explore More

Publication

Featured researches published by Simon Wiesler.

international conference on acoustics, speech, and signal processing | 2011

The RWTH 2010 Quaero ASR evaluation system for English, French, and German

Martin Sundermeyer; Markus Nussbaum-Thom; Simon Wiesler; Christian Plahl; A. El-Desoky Mousa; Stefan Hahn; David Nolden; Ralf Schlüter; Hermann Ney

Recognizing Broadcast Conversational (BC) speech data is a difficult task, which can be regarded as one of the major challenges beyond the recognition of Broadcast News (BN).

international conference on acoustics, speech, and signal processing | 2014

Mean-normalized stochastic gradient for large-scale deep learning

Simon Wiesler; Alexander Richard; Ralf Schlüter; Hermann Ney

Deep neural networks are typically optimized with stochastic gradient descent (SGD). In this work, we propose a novel second-order stochastic optimization algorithm. The algorithm is based on analytic results showing that a non-zero mean of features is harmful for the optimization. We prove convergence of our algorithm in a convex setting. In our experiments we show that our proposed algorithm converges faster than SGD. Further, in contrast to earlier work, our algorithm allows for training models with a factorized structure from scratch. We found this structure to be very useful not only because it accelerates training and decoding, but also because it is a very effective means against overfitting. Combining our proposed optimization algorithm with this model structure, model size can be reduced by a factor of eight and still improvements in recognition error rate are obtained. Additional gains are obtained by improving the Newbob learning rate strategy.

IEEE Signal Processing Magazine | 2012

Discriminative Training for Automatic Speech Recognition: Modeling, Criteria, Optimization, Implementation, and Performance

Georg Heigold; Hermann Ney; R. Schluter; Simon Wiesler

Discriminative training techniques have been shown to consistently outperform the maximum likelihood (ML) paradigm for acoustic model training in automatic speech recognition (ASR). Consequently, todays discriminative training methods are fundamental components of state-of-the-art systems and are a major line of research in speech recognition. This article gives a comprehensive overview of discriminative training methods for acoustic model training in the context of ASR. The article covers all related aspects of discriminative training for speech recognition, i.e., specific training criteria and their relation, statistical modeling, different parameter optimization approaches, efficient implementation of discriminative training, and a performance overview.

ieee automatic speech recognition and understanding workshop | 2009

Investigations on features for log-linear acoustic models in continuous speech recognition

Simon Wiesler; Markus Nussbaum-Thom; Georg Heigold; R. Schluter; Hermann Ney

Hidden Markov Models with Gaussian Mixture Models as emission probabilities (GHMMs) are the underlying structure of all state-of-the-art speech recognition systems. Using Gaussian mixture distributions follows the generative approach where the class-conditional probability is modeled, although for classification only the posterior probability is needed. Though being very successful in related tasks like Natural Language Processing (NLP), in speech recognition direct modeling of posterior probabilities with log-linear models has rarely been used and has not been applied successfully to continuous speech recognition. In this paper we report competitive results for a speech recognizer with a log-linear acoustic model on the Wall Street Journal corpus, a Large Vocabulary Continuous Speech Recognition (LVCSR) task. We trained this model from scratch, i.e. without relying on an existing GHMM system. Previously the use of data dependent sparse features for log-linear models has been proposed. We compare them with polynomial features and show that the combination of polynomial and data dependent sparse features leads to better results.

international conference on acoustics, speech, and signal processing | 2014

RASR/NN: The RWTH neural network toolkit for speech recognition

Simon Wiesler; Alexander Richard; Pavel Golik; Ralf Schlüter; Hermann Ney

This paper describes the new release of RASR - the open source version of the well-proven speech recognition toolkit developed and used at RWTH Aachen University. The focus is put on the implementation of the NN module for training neural network acoustic models. We describe code design, configuration, and features of the NN module. The key feature is a high flexibility regarding the network topology, choice of activation functions, training criteria, and optimization algorithm, as well as a built-in support for efficient GPU computing. The evaluation of run-time performance and recognition accuracy is performed exemplary with a deep neural network as acoustic model in a hybrid NN/HMM system. The results show that RASR achieves a state-of-the-art performance on a real-world large vocabulary task, while offering a complete pipeline for building and applying large scale speech recognition systems.

international conference on acoustics, speech, and signal processing | 2015

Sequence-discriminative training of recurrent neural networks

Paul Voigtlaender; Patrick Doetsch; Simon Wiesler; Ralf Schlüter; Hermann Ney

We investigate sequence-discriminative training of long shortterm memory recurrent neural networks using the maximum mutual information criterion. We show that although recurrent neural networks already make use of the whole observation sequence and are able to incorporate more contextual information than feed forward networks, their performance can be improved with sequence-discriminative training. Experiments are performed on two publicly available handwriting recognition tasks containing English and French handwriting. On the English corpus, we obtain a relative improvement in WER of over 11% with maximum mutual information (MMI) training compared to cross-entropy training. On the French corpus, we observed that it is necessary to interpolate the MMI objective function with cross-entropy.

international conference on acoustics, speech, and signal processing | 2010

Discriminative HMMS, log-linear models, and CRFS: What is the difference?

Georg Heigold; Simon Wiesler; Markus Nussbaum-Thom; Patrick Lehnen; R. Schluter; Hermann Ney

Recently, there have been many papers studying discriminative acoustic modeling techniques like conditional random fields or discriminative training of conventional Gaussian HMMs. This paper will give an overview of the recent work and progress. We will strictly distinguish between the type of acoustic models on the one hand and the training criterion on the other hand. We will address two issues in more detail: the relation between conventional Gaussian HMMs and conditional random fields and the advantages of formulating the training criterion as a convex optimization problem. Experimental results for various speech tasks will be presented to carefully evaluate the different concepts and approaches, including both a digit string and large vocabulary continuous speech recognition tasks.

international conference on acoustics, speech, and signal processing | 2015

Investigations on sequence training of neural networks

Simon Wiesler; Pavel Golik; Ralf Schlüter; Hermann Ney

In this paper we present an investigation of sequence-discriminative training of deep neural networks for automatic speech recognition. We evaluate different sequence-discriminative training criteria (MMI and MPE) and optimization algorithms (including SGD and Rprop) using the RASR toolkit. Further, we compare the training of the whole network with that of the output layer only. Technical details necessary for a robust training are studied, since there is no consensus yet on the ultimate training recipe. The investigation extends our previous work on training linear bottleneck networks from scratch showing the consistently positive effect of sequence training.

international conference on acoustics, speech, and signal processing | 2011

Feature selection for log-linear acoustic models

Simon Wiesler; Alexander Richard; Yotaro Kubo; Ralf Schlüter; Hermann Ney

Log-linear acoustic models have been shown to be competitive with Gaussian mixture models in speech recognition. Their high training time can be reduced by feature selection. We compare a simple univariate feature selection algorithm with ReliefF - an efficient multivariate algorithm. An alternative to feature selection is ℓ1-regularized training, which leads to sparse models. We observe that this gives no speedup when sparse features are used, hence feature selection methods are preferable. For dense features, ℓ1-regularization can reduce training and recognition time. We generalize the well known Rprop algorithm for the optimization of ℓ1-regularized functions. Experiments on the Wall Street Journal corpus showed that a large number of sparse features could be discarded without loss of performance. A strong regularization led to slight performance degradations, but can be useful on large tasks, where training the full model is not tractable.

international conference on acoustics, speech, and signal processing | 2011

Subspace pursuit method for kernel-log-linear models

Yotaro Kubo; Simon Wiesler; Ralf Schlueter; Hermann Ney; Shinji Watanabe; Atsushi Nakamura; Tetsunori Kobayashi

This paper presents a novel method for reducing the dimensionality of kernel spaces. Recently, to maintain the convexity of training, log-linear models without mixtures have been used as emission probability density functions in hidden Markov models for automatic speech recognition. In that framework, nonlinearly-transformed high-dimensional features are used to achieve the nonlinear classification of the original observation vectors without using mixtures. In this paper, with the goal of using high-dimensional features in kernel spaces, the cutting plane subspace pursuit method proposed for support vector machines is generalized and applied to log-linear models. The experimental results show that the proposed method achieved an efficient approximation of the feature space by using a limited number of basis vectors

Explore More