Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Gakuto Kurata is active.

Publication


Featured researches published by Gakuto Kurata.


international conference on acoustics, speech, and signal processing | 2009

Acoustically discriminative training for language models

Gakuto Kurata; Nobuyasu Itoh; Masafumi Nishimura

This paper introduces a discriminative training for language models (LMs) by leveraging phoneme similarities estimated from an acoustic model. To train an LM discriminatively, we needed the correct word sequences and the recognized results that Automatic Speech Recognition (ASR) produced by processing the utterances of those correct word sequences. But, sufficient utterances are not always available. We propose to generate the probable N-best lists, which the ASR may produce, directly from the correct word sequences by leveraging the phoneme similarities. We call this process the “Pseudo-ASR”. We train the LM discriminatively by comparing the correct word sequences and the corresponding N-best lists from the Pseudo-ASR. Experiments with real-life data from a Japanese call center showed that the LM trained with the proposed method improved the accuracy of the ASR.


international conference on acoustics, speech, and signal processing | 2007

Unsupervised Lexicon Acquisition from Speech and Text

Gakuto Kurata; Shinsuke Mori; Nobuyasu Itoh; Masafumi Nishimura

When introducing a large vocabulary continuous speech recognition (LVCSR) system into a specific domain, it is preferable to add the necessary domain-specific words and their correct pronunciations selectively to the lexicon, especially in the areas where the LVCSR system should be updated frequently by adding new words. In this paper, we propose an unsupervised method of word acquisition in Japanese, where no spaces exist between words. In our method, by taking advantage of the speech of the target domain, we selected the domain-specific words among an enormous number of word candidates extracted from the raw corpora. The experiments showed that the acquired lexicon was of good quality and that it contributed to the performance of the LVCSR system for the target domain.


north american chapter of the association for computational linguistics | 2016

Improved Neural Network-based Multi-label Classification with Better Initialization Leveraging Label Co-occurrence

Gakuto Kurata; Bing Xiang; Bowen Zhou

In a multi-label text classification task, in which multiple labels can be assigned to one text, label co-occurrence itself is informative. We propose a novel neural network initialization method to treat some of the neurons in the final hidden layer as dedicated neurons for each pattern of label co-occurrence. These dedicated neurons are initialized to connect to the corresponding co-occurring labels with stronger weights than to others. In experiments with a natural language query classification task, which requires multi-label classification, our initialization method improved classification accuracy without any computational overhead in training and evaluation.


Speech Communication | 2012

Acoustically discriminative language model training with pseudo-hypothesis

Gakuto Kurata; Abhinav Sethy; Bhuvana Ramabhadran; Ariya Rastrow; Nobuyasu Itoh; Masafumi Nishimura

Recently proposed methods for discriminative language modeling require alternate hypotheses in the form of lattices or N-best lists. These are usually generated by an Automatic Speech Recognition (ASR) system on the same speech data used to train the system. This requirement restricts the scope of these methods to corpora where both the acoustic material and the corresponding true transcripts are available. Typically, the text data available for language model (LM) training is an order of magnitude larger than manually transcribed speech. This paper provides a general framework to take advantage of this volume of textual data in the discriminative training of language models. We propose to generate probable N-best lists directly from the text material, which resemble the N-best lists produced by an ASR system by incorporating phonetic confusability estimated from the acoustic model of the ASR system. We present experiments with Japanese spontaneous lecture speech data, which demonstrate that discriminative LM training with the proposed framework is effective and provides modest gains in ASR accuracy.


international conference on acoustics, speech, and signal processing | 2011

Training of error-corrective model for ASR without using audio data

Gakuto Kurata; Nobuyasu Itoh; Masafumi Nishimura

This paper introduces a method to train an error-corrective model for Automatic Speech Recognition (ASR) without using audio data. In existing techniques, it is assumed that sufficient audio data of the target application is available and negative samples can be prepared by having ASR recognize this audio data. However, this assumption is not always true. We propose generating probable N-best lists, which the ASR may produce, directly from the text data of the target application by taking phoneme similarity into consideration. We call this process “Pseudo-ASR”. We conduct discriminative reranking with the error-corrective model by regarding the text data as positive samples and the N-best lists from the Pseudo-ASR as negative samples. Experiments with Japanese call center data showed that discriminative reranking based on the Pseudo-ASR improved the accuracy of the ASR.


meeting of the association for computational linguistics | 2006

Phoneme-to-Text Transcription System with an Infinite Vocabulary

Shinsuke Mori; Daisuke Takuma; Gakuto Kurata

The noisy channel model approach is successfully applied to various natural language processing tasks. Currently the main research focus of this approach is adaptation methods, how to capture characteristics of words and expressions in a target domain given example sentences in that domain. As a solution we describe a method enlarging the vocabulary of a language model to an almost infinite size and capturing their context information. Especially the new method is suitable for languages in which words are not delimited by whitespace. We applied our method to a phoneme-to-text transcription task in Japanese and reduced about 10% of the errors in the results of an existing method.


Speech Communication | 2012

Leveraging word confusion networks for named entity modeling and detection from conversational telephone speech

Gakuto Kurata; Nobuyasu Itoh; Masafumi Nishimura; Abhinav Sethy; Bhuvana Ramabhadran

Named Entity (NE) detection from Conversational Telephone Speech (CTS) is important from business aspects. However, results of Automatic Speech Recognition (ASR) inevitably contain errors and this makes NE detection from CTS more difficult than from written text. One of the options to detect NEs is to use a statistical NE model. In order to capture the nature of ASR errors, the NE model is usually trained with the ASR one-best results instead of manually transcribed text and then is applied to the ASR one-best results of speech that contain NEs. To make NE detection more robust to ASR errors, we propose using Word Confusion Networks (WCNs), sequences of bundled words, for both NE modeling and detection by regarding the word bundles as units instead of the independent words. We realize this by clustering similar word bundles that may originate from the same word. We trained the NE models that predict the NE tag sequences from the sequence of the word bundles with the maximum entropy principle. Note that clustering of word bundles is conducted in advance of NE modeling and thus our proposed method can combine with any NE modeling method. We conducted experiments using real-life call-center data. The experimental results showed that by using the WCNs, the accuracy of NE detection improved regardless of the NE modeling method.


international conference on acoustics, speech, and signal processing | 2011

Named entity recognition from Conversational Telephone Speech leveraging Word Confusion Networks for training and recognition

Gakuto Kurata; Nobuyasu Itoh; Masafumi Nishimura; Abhinav Sethy; Bhuvana Ramabhadran

Named Entity (NE) recognition from the results of Automatic Speech Recognition (ASR) is challenging because of ASR errors. To detect NEs, one of the options is to use a statistical NE model that is usually trained with ASR one-best results. In order to make NE recognition more robust to ASR errors, we propose using Word Confusion Networks (WCNs), sequences of bundled words, for both NE modeling and recognition by regarding the word bundles as units instead of the independent words. This is done by clustering similar word bundles that may originate from the same word. We trained the NE models with the maximum entropy principle and evaluated the performance using real-life call-center data. The results showed that by using the WCNs, the error of NE recognition was relatively reduced by up to 33.0%.


international conference on acoustics, speech, and signal processing | 2006

Unsupervised Adaptation of a Stochastic Language Model Using a Japanese Raw Corpus

Gakuto Kurata; Shinsuke Mori; Masafumi Nishimura

The target uses of large vocabulary continuous speech recognition (LVCSR) systems are spreading. It takes a lot of time to build a good LVCSR system specialized for the target domain because experts need to manually segment the corpus of the target domain, which is a labor-intensive task. In this paper, we propose a new method to adapt an LVCSR system to a new domain. In our method, we stochastically segment a Japanese raw corpus of the target domain. Then a domain-specific language model (LM) is built based on this corpus. All of the domain-specific words can be added to the lexicon for LVCSR. Most importantly, the proposed method is fully automatic. Therefore, we can reduce the time for introducing an LVCSR system drastically. In addition, the proposed method yielded a comparable or even superior performance to use of expensive manual segmentation


empirical methods in natural language processing | 2016

Leveraging Sentence-level Information with Encoder LSTM for Semantic Slot Filling

Gakuto Kurata; Bing Xiang; Bowen Zhou; Mo Yu

Recurrent Neural Network (RNN) and one of its specific architectures, Long Short-Term Memory (LSTM), have been widely used for sequence labeling. In this paper, we first enhance LSTM-based sequence labeling to explicitly model label dependencies. Then we propose another enhancement to incorporate the global information spanning over the whole input sequence. The latter proposed method, encoder-labeler LSTM, first encodes the whole input sequence into a fixed length vector with the encoder LSTM, and then uses this encoded vector as the initial state of another LSTM for sequence labeling. Combining these methods, we can predict the label sequence with considering label dependencies and information of whole input sequence. In the experiments of a slot filling task, which is an essential component of natural language understanding, with using the standard ATIS corpus, we achieved the state-of-the-art F1-score of 95.66%.

Researchain Logo
Decentralizing Knowledge