Kazuki Irie
RWTH Aachen University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kazuki Irie.
conference of the international speech communication association | 2016
Kazuki Irie; Zoltán Tüske; Tamer Alkhouli; Ralf Schlüter; Hermann Ney
Popularized by the long short-term memory (LSTM), multiplicative gates have become a standard means to design artificial neural networks with intentionally organized information flow. Notable examples of such architectures include gated recurrent units (GRU) and highway networks. In this work, we first focus on the evaluation of each of the classical gated architectures for language modeling for large vocabulary speech recognition. Namely, we evaluate the highway network, lateral network, LSTM and GRU. Furthermore, the motivation underlying the highway network also applies to LSTM and GRU. An extension specific to the LSTM has been recently proposed with an additional highway connection between the memory cells of adjacent LSTM layers. In contrast, we investigate an approach which can be used with both LSTM and GRU: a highway network in which the LSTM or GRU is used as the transformation function. We found that the highway connections enable both standalone feedforward and recurrent neural language models to benefit better from the deep structure and provide a slight improvement of recognition accuracy after interpolation with count models. To complete the overview, we include our initial investigations on the use of the attention mechanism for learning word triggers.
international conference on acoustics, speech, and signal processing | 2016
Zoltán Tüske; Kazuki Irie; Ralf Schlüter; Hermann Ney
Inspired by the success of multi-task training in acoustic modeling, this paper investigates a new architecture for a multi-domain neural network based language model (NNLM). The proposed model has several shared hidden layers and domain-specific output layers. As will be shown, the log-linear interpolation of the multi-domain outputs and the optimization of interpolation weights fit naturally in the framework of NNLM. The resulting model can be expressed as a single NNLM. As an initial study of such an architecture, this paper focuses on deep feed-forward neural networks (DNNs). We also re-investigate the potential of long context up to 30-grams, and depth up to 5 hidden layers in DNN-LM. Our final feed-forward multidomain NNLM is trained on 3.1B running words across 11 domains for English broadcast news and conversations large vocabulary continuous speech recognition task. After log-linear interpolation and fine-tuning, we measured improvements in terms of perplexity and word error rate over the models trained on 50M running words of in-domain news resources. The final multi-domain feed-forward LM outperformed our previous best LSTM-RNN LM trained on the 50M in-domain corpus, even after linear interpolation with large count models.
international conference on acoustics, speech, and signal processing | 2014
Simon Wiesler; Kazuki Irie; Zoltán Tüske; Ralf Schlüter; Hermann Ney
In this paper, we describe the RWTH speech recognition system for English lectures developed within the Translectures project. A difficulty in the development of an English lectures recognition system, is the high ratio of non-native speakers. We address this problem by using very effective deep bottleneck features trained on multilingual data. The acoustic model is trained on large amounts of data from different domains and with different dialects. Large improvements are obtained from unsupervised acoustic adaptation. Another challenge is the frequent use of technical terms and the wide range of topics. In our recognition system, slides, which are attached to most lectures, are used for improving lexical coverage and language model adaptation.
international conference on acoustics, speech, and signal processing | 2017
Kazuki Irie; Pavel Golik; Ralf Schlüter; Hermann Ney
In this paper, we present an investigation on technical details of the byte-level convolutional layer which replaces the conventional linear word projection layer in the neural language model. In particular, we discuss and compare the effective filter configurations, pooling types and the use of bytes instead of characters. We carry out experiments on language packs released by the IARPA Babel project and measure the performance in terms of perplexity and word error rate. Introducing a convolutional layer consistently improves the results on all languages. Also, there is no degradation from using raw bytes instead of proper Unicode characters, even on syllabic alphabets like Amharic. In addition, we report improvements in word error rate from rescoring lattices and evaluate keyword search performance on several languages.
international conference on speech and computer | 2017
Pavel Golik; Zoltán Tüske; Kazuki Irie; Eugen Beck; Ralf Schlüter; Hermann Ney
In this paper we describe the RWTH Aachen keyword search (KWS) system developed in the course of the IARPA Babel program. We put focus on acoustic modeling with neural networks and evaluate the full pipeline with respect to the KWS performance. At the core of this study lie multilingual bottleneck features extracted from a deep neural network trained on all 28 languages available to the project articipants. We show that in a low-resource scenario, the multilingual features are crucial for achieving state-of-the-art performance.
international conference on speech and computer | 2016
Ralf Schlüter; Patrick Doetsch; Pavel Golik; Markus Kitza; Tobias Menne; Kazuki Irie; Zoltán Tüske; Albert Zeyer
In automatic speech recognition, as in many areas of machine learning, stochastic modeling relies on neural networks more and more. Both in acoustic and language modeling, neural networks today mark the state of the art for large vocabulary continuous speech recognition, providing huge improvements over former approaches that were solely based on Gaussian mixture hidden markov models and count-based language models. We give an overview of current activities in neural network based modeling for automatic speech recognition. This includes discussions of network topologies and cell types, training and optimization, choice of input features, adaptation and normalization, multitask training, as well as neural network based language modeling. Despite the clear progress obtained with neural network modeling in speech recognition, a lot is to be done, yet to obtain a consistent and self-contained neural network based modeling approach that ties in with the former state of the art. We will conclude by a discussion of open problems as well as potential future directions w.r.t. to neural network integration into automatic speech recognition systems.
conference of the international speech communication association | 2015
Rami Botros; Kazuki Irie; Martin Sundermeyer; Hermann Ney
conference of the international speech communication association | 2018
Albert Zeyer; Kazuki Irie; Ralf Schlüter; Hermann Ney
international conference on acoustics, speech, and signal processing | 2018
Kazuki Irie; Zhihong Lei; Ralf Schlüter; Hermann Ney
international conference on acoustics, speech, and signal processing | 2018
Kazuki Irie; Shankar Kumar; Michael Nirschl; Hank Liao