David Rybach | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David Rybach is active.

Explore More

Publication

Featured researches published by David Rybach.

international conference on automatic face and gesture recognition | 2006

Tracking using dynamic programming for appearance-based sign language recognition

Philippe Dreuw; Thomas Deselaers; David Rybach; Daniel Keysers; Hermann Ney

We present a novel tracking algorithm that uses dynamic programming to determine the path of target objects and that is able to track an arbitrary number of different objects. The traceback method used to track the targets avoids taking possibly wrong local decisions and thus reconstructs the best tracking paths using the whole observation sequence. The tracking method can be compared to the nonlinear time alignment in automatic speech recognition (ASR) and it can analogously be integrated into a hidden Markov model based recognition process. We show how the method can be applied to the tracking of hands and the face for automatic sign language recognition

international conference on document analysis and recognition | 2009

Writer Adaptive Training and Writing Variant Model Refinement for Offline Arabic Handwriting Recognition

Philippe Dreuw; David Rybach; Christian Gollan; Hermann Ney

We present a writer adaptive training and writer clustering approach for an HMM based Arabic handwriting recognition system to handle different handwriting styles and their variations. Additionally, a writing variant model refinement for specific writing variants is proposed.Current approaches try to compensate the impact of different writing styles during preprocessing and normalization steps.Writer adaptive training with a CMLLR based feature adaptation is used to train writer dependent models. An unsupervised writer clustering with Bayesian information criterion based stopping condition for a CMLLR based feature adaptation during a two-pass decoding process is used to cluster different handwriting styles of unknown test writers.The proposed methods are evaluated on the IFN/ENIT Arabic handwriting database.

international conference on acoustics, speech, and signal processing | 2016

Personalized speech recognition on mobile devices

Ian McGraw; Rohit Prabhavalkar; Raziel Alvarez; Montse Gonzalez Arenas; Kanishka Rao; David Rybach; Ouais Alsharif; Hasim Sak; Alexander H. Gruenstein; Francoise Beaufays; Carolina Parada

We describe a large vocabulary speech recognition system that is accurate, has low latency, and yet has a small enough memory and computational footprint to run faster than real-time on a Nexus 5 Android smartphone. We employ a quantized Long Short-Term Memory (LSTM) acoustic model trained with connectionist temporal classification (CTC) to directly predict phoneme targets, and further reduce its memory footprint using an SVD-based compression scheme. Additionally, we minimize our memory footprint by using a single language model for both dictation and voice command domains, constructed using Bayesian interpolation. Finally, in order to properly handle device-specific information, such as proper names and other context-dependent information, we inject vocabulary items into the decoder graph and bias the language model on-the-fly. Our system achieves 13.5% word error rate on an open-ended dictation task, running with a median speed that is seven times faster than real-time.

international conference on acoustics, speech, and signal processing | 2009

Audio segmentation for speech recognition using segment features

David Rybach; Christian Gollan; Ralf Schlüter; Hermann Ney

Audio segmentation is an essential preprocessing step in several audio processing applications with a significant impact e.g. on speech recognition performance. We introduce a novel framework which combines the advantages of different well known segmentation methods. An automatically estimated log-linear segment model is used to determine the segmentation of an audio stream in a holistic way by a maximum a posteriori decoding strategy, instead of classifying change points locally. A comparison to other segmentation techniques in terms of speech recognition performance is presented, showing a promising segmentation quality of our approach.

ieee automatic speech recognition and understanding workshop | 2007

Advances in Arabic broadcast news transcription at RWTH

David Rybach; Stefan Hahn; Christian Gollan; Ralf Schlüter; Hermann Ney

This paper describes the RWTH speech recognition system for Arabic. Several design aspects of the system, including cross-adaptation, multiple system design and combination, are analyzed. We summarize the semi-automatic lexicon generation for Arabic using a statistical approach to grapheme-to-phoneme conversion and pronunciation statistics. Furthermore, a novel ASR-based audio segmentation algorithm is presented. Finally, we discuss practical approaches for parallelized acoustic training and memory efficient lattice rescoring. Systematic results are reported on recent GALE evaluation corpora.

international conference on acoustics, speech, and signal processing | 2013

Open vocabulary handwriting recognition using combined word-level and character-level language models

Michal Kozielski; David Rybach; Stefan Hahn; Ralf Schlüter; Hermann Ney

In this paper, we present a unified search strategy for open vocabulary handwriting recognition using weighted finite state transducers. Additionally to a standard word-level language model we introduce a separate n-gram character-level language model for out-of-vocabulary word detection and recognition. The probabilities assigned by those two models are combined into one Bayes decision rule. We evaluate the proposed method on the IAM database of English handwriting. An improvement from 22.2% word error rate to 17.3% is achieved comparing to the closed-vocabulary scenario and the best published result.

Archive | 2012

RWTH OCR: A Large Vocabulary Optical Character Recognition System for Arabic Scripts

Philippe Dreuw; David Rybach; Georg Heigold; Hermann Ney

We present a novel large vocabulary OCR system, which implements a confidence- and margin-based discriminative training approach for model adaptation of an HMM-based recognition system to handle multiple fonts, different handwriting styles, and their variations. Most current HMM approaches are HTK-based systems which are maximum likelihood (ML) trained and which try to adapt their models to different writing styles using writer adaptive training, unsupervised clustering, or additional writer-specific data. Here, discriminative training based on the maximum mutual information (MMI) and minimum phone error (MPE) criteria are used instead. For model adaptation during decoding, an unsupervised confidence-based discriminative training within a two-pass decoding process is proposed. Additionally, we use neural network-based features extracted by a hierarchical multi-layer perceptron (MLP) network either in a hybrid MLP/HMM approach or to discriminatively retrain a Gaussian HMM system in a tandem approach. The proposed framework and methods are evaluated for closed-vocabulary isolated handwritten word recognition on the IFN/ENIT-database Arabic handwriting database, where the word error rate is decreased by more than 50 % relative to an ML trained baseline system. Preliminary results for large vocabulary Arabic machine-printed text recognition tasks are presented on a novel publicly available newspaper database.

international conference on acoustics, speech, and signal processing | 2014

Context dependent state tying for speech recognition using deep neural network acoustic models

Michiel Bacchiani; David Rybach

This paper proposes an algorithm to design a tied-state inventory for a context dependent, neural network-based acoustic model for speech recognition. Rather than relying on a GMM/HMM system that operates on a different feature space and is of a different model family, the proposed algorithm optimizes state tying on the activation vectors of the neural network directly. Experiments show the viability of the proposed algorithm reducing the WER from 36.3% for a context independent system to 16.0% for a 15000 tied-state system.

international conference on acoustics, speech, and signal processing | 2011

A comparative analysis of dynamic network decoding

David Rybach; Ralf Schlüter; Hermann Ney

The use of statically compiled search networks for ASR systems using huge vocabularies and complex language models often becomes challenging in terms of memory requirements. Dynamic network decoders introduce additional computations in favor of significantly lower memory consumption. In this paper we investigate the properties of two well-known search strategies for dynamic network decoding, namely history conditioned tree search and WFST-based search using dynamic transducer composition. We analyze the impact of the differences in search graph representation, search space structure, and language model look-ahead techniques. Experiments on an LVCSR task illustrate the influence of the compared properties.

IEEE Transactions on Audio, Speech, and Language Processing | 2012

WFST Enabled Solutions to ASR Problems: Beyond HMM Decoding

Björn Hoffmeister; Georg Heigold; David Rybach; Ralf Schlüter; Hermann Ney

During the last decade, weighted finite-state transducers (WFSTs) have become popular in speech recognition. While their main field of application remains hidden Markov model (HMM) decoding, the WFST framework is now also seen as a brick in solutions to many other central problems in automatic speech recognition (ASR). These solutions are less known, and this work aims at giving an overview of the applications of WFSTs in large-vocabulary continuous speech recognition (LVCSR) besides HMM decoding: discriminative acoustic model training, Bayes risk decoding, and system combination. The application of the WFST framework has a big practical impact: we show how the framework helps to structure problems, to develop generic solutions, and to delegate complex computations to WFST toolkits. In this paper, we review the literature, discuss existing approaches, and provide new insights into WFST enabled solutions. We also present a novel, purely WFST-based algorithm for computing the exact Bayes risk hypothesis from a lattice with the Levenshtein distance as loss function. We present the problems and their solutions in a unified framework and discuss the advantages and limits of using WFSTs. We do not provide new experimental results, but refer to the existing literature. Our work helps to identify where and how the transducer framework can contribute to a compact and generic solution to LVCSR problems.

Explore More