Björn Hoffmeister | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Björn Hoffmeister is active.

Explore More

Publication

Featured researches published by Björn Hoffmeister.

international conference on acoustics, speech, and signal processing | 2007

Cross-Site and Intra-Site ASR System Combination: Comparisons on Lattice and 1-Best Methods

Björn Hoffmeister; Dustin Hillard; Stefan Hahn; Ralf Schlüter; Mari Ostendorf; Hermann Ney

We evaluate system combination techniques for automatic speech recognition using systems from multiple sites who participated in the TC-STAR 2006 evaluation. Both lattice and 1-best combination techniques are tested for cross-site and intra-site tasks. For pairwise combinations the lattice based approaches can outperform 1-best ROVER with confidence scores, but 1-best ROVER results are equal (or even better) when combining three or four systems.

ieee automatic speech recognition and understanding workshop | 2007

Development of the 2007 RWTH Mandarin LVCSR system

Björn Hoffmeister; Christian Plahl; Peter Fritz; Georg Heigold; Jonas Lööf; Ralf Schlüter; Hermann Ney

This paper describes the development of the RWTH Mandarin LVCSR system. Different acoustic front-ends together with multiple system cross-adaptation are used in a two stage decoding framework. We describe the system in detail and present systematic recognition results. Especially, we compare a variety of approaches for cross-adapting to multiple systems. During the development we did a comparative study on different methods for integrating tone and phoneme posterior features. Furthermore, we apply lattice based consensus decoding and system combination methods. In these methods, the effect of minimizing character instead of word errors is compared. The final system obtains a character error rate of 17.7% on the GALE 2006 evaluation data.

IEEE Transactions on Audio, Speech, and Language Processing | 2012

WFST Enabled Solutions to ASR Problems: Beyond HMM Decoding

Björn Hoffmeister; Georg Heigold; David Rybach; Ralf Schlüter; Hermann Ney

During the last decade, weighted finite-state transducers (WFSTs) have become popular in speech recognition. While their main field of application remains hidden Markov model (HMM) decoding, the WFST framework is now also seen as a brick in solutions to many other central problems in automatic speech recognition (ASR). These solutions are less known, and this work aims at giving an overview of the applications of WFSTs in large-vocabulary continuous speech recognition (LVCSR) besides HMM decoding: discriminative acoustic model training, Bayes risk decoding, and system combination. The application of the WFST framework has a big practical impact: we show how the framework helps to structure problems, to develop generic solutions, and to delegate complex computations to WFST toolkits. In this paper, we review the literature, discuss existing approaches, and provide new insights into WFST enabled solutions. We also present a novel, purely WFST-based algorithm for computing the exact Bayes risk hypothesis from a lattice with the Levenshtein distance as loss function. We present the problems and their solutions in a unified framework and discuss the advantages and limits of using WFSTs. We do not provide new experimental results, but refer to the existing literature. Our work helps to identify where and how the transducer framework can contribute to a compact and generic solution to LVCSR problems.

conference of the international speech communication association | 2016

Multi-Task Learning and Weighted Cross-Entropy for DNN-Based Keyword Spotting.

Sankaran Panchapagesan; Ming Sun; Aparna Khare; Spyros Matsoukas; Arindam Mandal; Björn Hoffmeister; Shiv Vitaladevuni

We propose improved Deep Neural Network (DNN) training loss functions for more accurate single keyword spotting on resource-constrained embedded devices. The loss function modifications consist of a combination of multi-task training and weighted cross entropy. In the multi-task architecture, the keyword DNN acoustic model is trained with two tasks in parallel the main task of predicting the keyword-specific phone states, and an auxiliary task of predicting LVCSR senones. We show that multi-task learning leads to comparable accuracy over a previously proposed transfer learning approach where the keyword DNN training is initialized by an LVCSR DNN of the same input and hidden layer sizes. The combination of LVCSRinitialization and Multi-task training gives improved keyword detection accuracy compared to either technique alone. We also propose modifying the loss function to give a higher weight on input frames corresponding to keyword phone targets, with a motivation to balance the keyword and background training data. We show that weighted cross-entropy results in additional accuracy improvements. Finally, we show that the combination of 3 techniques LVCSR-initialization, multi-task training and weighted cross-entropy gives the best results, with significantly lower False Alarm Rate than the LVCSR-initialization technique alone, across a wide range of Miss Rates.

conference of the international speech communication association | 2016

LatticeRnn: Recurrent Neural Networks Over Lattices.

Faisal Ladhak; Ankur Gandhe; Markus Dreyer; Lambert Mathias; Ariya Rastrow; Björn Hoffmeister

We present a new model called LATTICERNN, which generalizes recurrent neural networks (RNNs) to process weighted lattices as input, instead of sequences. A LATTICERNN can encode the complete structure of a lattice into a dense representation, which makes it suitable to a variety of problems, including rescoring, classifying, parsing, or translating lattices using deep neural networks (DNNs). In this paper, we use LATTICERNNs for a classification task: each lattice represents the output from an automatic speech recognition (ASR) component of a spoken language understanding (SLU) system, and we classify the intent of the spoken utterance based on the lattice embedding computed by a LATTICERNN. We show that making decisions based on the full ASR output lattice, as opposed to 1-best or n-best hypotheses, makes SLU systems more robust to ASR errors. Our experiments yield improvements of 13% over a baseline RNN system trained on transcriptions and 10% over an nbest list rescoring system for intent classification.

conference of the international speech communication association | 2016

Anchored Speech Detection.

Roland Maas; Sree Hari Krishnan Parthasarathi; Brian King; Ruitong Huang; Björn Hoffmeister

We propose two new methods of speech detection in the context of voice-controlled far-field appliances. While conventional detection methods are designed to differentiate between speech and nonspeech, we aim at distinguishing desired speech, which we define as speech originating from the person interacting with the device, from background noise and interfering talkers. Our two proposed methods use the first word spoken by the desired talker, the “anchor” word, as a reference to learn characteristics about that speaker. In the first method, we estimate the mean of the anchor word segment and subtract it from the subsequent feature vectors. In the second, we use an encoder-decoder network with features that are normalized by applying conventional log amplitude causal mean subtraction. The experimental results reveal that both techniques achieve around 10% relative reduction in frame classification error rate over a baseline feedforward network with conventionally normalized features.

conference of the international speech communication association | 2009