Yuuki Tachioka | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yuuki Tachioka is active.

Explore More

Publication

Featured researches published by Yuuki Tachioka.

international conference on acoustics, speech, and signal processing | 2015

Discriminative method for recurrent neural network language models

Yuuki Tachioka; Shinji Watanabe

A recurrent neural network language model (RNN-LM) can use a long word context more than can an n-gram language model, and its effective has recently been shown in its accomplishment of automatic speech recognition (ASR) tasks. However, the training criteria of RNN-LM are based on cross entropy (CE) between predicted and reference words. In addition, unlike the discriminative training of acoustic models and discriminative language models (DLM), these criteria do not explicitly consider discriminative criteria calculated from ASR hypotheses and references. This paper proposes a discriminative training method for RNN-LM by additionally considering a discriminative criterion to CE. We use the log-likelihood ratio of the ASR hypotheses and references as an discriminative criterion. The proposed training criterion emphasizes the effect of improperly recognized words relatively compared to the effect of correct words, which are discounted in training. Experiments on a large vocabulary continuous speech recognition task show that our proposed method improves the RNN-LM baseline. In addition, combining the proposed discriminative RNN-LM and DLM further shows its effectiveness.

asia pacific signal and information processing association annual summit and conference | 2015

Feature-space structural MAPLR with regression tree-based multiple transformation matrices for DNN

Hiroki Kanagawa; Yuuki Tachioka; Shinji Watanabe; Jun Ishii

Feature-space maximum-likelihood linear regression (fMLLR) transforms acoustic features to adapted ones by a multiplication operation with a single transformation matrix. This property realizes an efficient adaptation performed within a pre-precessing, which is independent of a decoding process, and this type of adaptation can be applied to deep neural network (DNN). On the other hand, constrained MLLR (CMLLR) uses multiple transformation matrices based on a regression tree, which provides further improvement from fMLLR. However, there are two problems in the model-space adaptations: first, these types of adaptation cannot be applied to DNN because adaptation and decoding must share the same generative model, i.e. Gaussian mixture model (GMM). Second, transformation matrices tend to be overly fit when the amount of adaptation data is small. This paper proposes to use multiple transformation matrices within a feature-space adaptation framework. The proposed method first estimates multiple transformation matrices in the GMM framework according to the first-pass decoding results and the alignments, and then takes a weighted sum of these matrices to obtain a single feature transformation matrix frame-by-frame. In addition, to address the second problem, we propose feature-space structural maximum a posteriori linear regression (fSMAPLR), which introduces hierarchal prior distributions to regularize the MAP estimation. Experimental results show that the proposed fSMAPLR outperformed fMLLR.

New Era for Robust Speech Recognition, Exploiting Deep Learning | 2017

Advanced ASR Technologies for Mitsubishi Electric Speech Applications.

Yuuki Tachioka; Toshiyuki Hanazawa; Tomohiro Narita; Jun Ishii

Mitsubishi Electric Corporation has been developing speech applications for 20 years. Our main targets are car navigation systems, elevator-controlling systems, and other industrial devices. This chapter deals with automatic speech recognition technologies which were developed for these applications. To realize real-time processing with small resources, syllable N-gram-based text search is proposed. To deal with reverberant environments in elevators, spectral-subtraction-based dereverberation techniques with reverberation time estimation are used. In addition, discriminative methods for acoustic and language models are developed.

Journal of Information Processing | 2017

Prior-based Binary Masking and Discriminative Methods for Reverberant and Noisy Speech Recognition Using Distant Stereo Microphones

Yuuki Tachioka; Shinji Watanabe; Jonathan Le Roux; John R. Hershey

Reverberant and noisy automatic speech recognition (ASR) using distant stereo microphones is a very challenging, but desirable scenario for home-environment speech applications. This scenario can often provide prior knowledge such as physical information about the sound sources and the environment in advance, which may then be used to reduce the influence of the interference. We propose a method to enhance the binary masking algorithm by using prior distributions of the time difference of arrival. This paper also validates state-of-the-art ASR techniques that include various discriminative training and feature transformation methods. Furthermore, we develop an efficient method to combine discriminative language modeling and minimum Bayes risk decoding in the ASR post-processing stage. We also investigate the effectiveness of this method when used for reverberated and noisy ASR of deep neural networks (DNNs) as well when used in systems that combine multiple DNNs using different features. Experiments on the medium vocabulary sub-task of the second CHiME challenge show that the system submitted to the challenge achieved a 26.86% word error rate (WER), moreover, the DNN system with the discriminative training, speaker adaptation and system combination achieves a 20.40% WER.

asia pacific signal and information processing association annual summit and conference | 2016

Optimal automatic speech recognition system selection for noisy environments

Yuuki Tachioka; Tomohiro Narita

To improve the performance of noisy automatic speech recognition (ASR), it is effective to prepare multiple ASR systems that can address the large varieties of noise. However, the optimal ASR system is different for each environment and mismatches between training and testing degrade ASR performance. In this situation, the overall system combination of multiple systems is effective; however, the computational resources increase in proportion to the number of systems. This paper proposes a method to select an optimal single system from multiple systems. The selection is based on the estimated word error rates of a respective system by using the i-vector similarities between training and test data. The experiments on the third CHiME challenge show that our proposed method can efficiently select a single system from multiple systems with different speech enhancement and feature transformation methods to improve the overall performance without increasing computational resources.

Journal of the Acoustical Society of America | 2016

Multi-channel non-negative matrix factorization with binary mask initialization for automatic speech recognition

Iori Miura; Yuuki Tachioka; Tomohiro Narita; Jun Ishii; Fuminori Yoshiyama; Shingo Uenohara; Ken'ichi Furuya

Non-negative Matrix Factorization (NMF) factorizes a non-negative matrix into two non-negative matrices. In the field of acoustics, multichannel expansion has been proposed to consider spatial information for sound source separation. Conventional multi-channel NMF has a difficulty in an initial-value dependency of the separation performance due to local minima. This paper proposes initial value settings by using binary masking based sound source separation whose masks on the time frequency domain are calculated from the time difference of arrival of each source. The proposed method calculates initial spatial correlation matrices using separated sources by binary masking. The music separation experiments confirmed that the separation performance of the proposed method was better than that of the conventional method. In addition, we evaluated initial value settings by using binary masking for automatic speech recognition (ASR) tasks in noisy environments. The ASR experiments confirmed that appropriate initi...

ieee global conference on signal and information processing | 2014

Sequence discriminative training for low-rank deep neural networks

Yuuki Tachioka; Shinji Watanabe; Jonathan Le Roux; John R. Hershey

Deep neural networks (DNNs) have proven very successful for automatic speech recognition but the number of parameters tends to be large, leading to high computational cost. To reduce the size of a DNN model, low-rank approximations of weight matrices, computed using singular value decomposition (SVD), have previously been applied. Previous studies only focused on clean speech, whereas the additional variability in noisy speech could make model reduction difficult. Thus we investigate the effectiveness of this SVD method on noisy reverberated speech. Furthermore, we combine the low-rank approximation with sequence discriminative training, which further improved the performance of the DNN, even though the original DNN was constructed using a discriminative criterion. We also investigated the effect of the order of application of the low-rank and sequence discriminative training. Our experiments show that low rank approximation is effective for noisy speech and the most effective combination of discriminative training with model reduction is to apply the low rank approximation to the base model first and then to perform discriminative training on the low-rank model. This low-rank discriminatively trained model outperformed the full discriminatively trained model.

Archive | 2013