Mitsuru Endo | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mitsuru Endo is active.

Explore More

Publication

Featured researches published by Mitsuru Endo.

IWSDS | 2017

Convolutional Neural Networks for Multi-topic Dialog State Tracking

Hongjie Shi; Takashi Ushio; Mitsuru Endo; Katsuyoshi Yamagami; Noriaki Horii

The main task of the fourth Dialog State Tracking Challenge (DSTC4) is to track the dialog state by filling in various slots, each of which represents a major subject discussed in the dialog. In this article we focus on the ‘INFO’ slot that tracks the general information provided in a sub-dialog segment, and propose an approach for this slot-filling using convolutional neural networks (CNNs). Our CNN model is adapted to multi-topic dialog by including a convolutional layer with general and topic-specific filters. The evaluation on DSTC4 common test data shows that our approach outperforms all other submitted entries in terms of overall accuracy of the ‘INFO’ slot.

spoken language technology workshop | 2016

A multichannel convolutional neural network for cross-language dialog state tracking

Hongjie Shi; Takashi Ushio; Mitsuru Endo; Katsuyoshi Yamagami; Noriaki Horii

The fifth Dialog State Tracking Challenge (DSTC5) introduces a new cross-language dialog state tracking scenario, where the participants are asked to build their trackers based on the English training corpus, while evaluating them with the unlabeled Chinese corpus. Although the computer-generated translations for both English and Chinese corpus are provided in the dataset, these translations contain errors and careless use of them can easily hurt the performance of the built trackers. To address this problem, we propose a multichannel Convolutional Neural Networks (CNN) architecture, in which we treat English and Chinese language as different input channels of one single CNN model. In the evaluation of DSTC5, we found that such multichannel architecture can effectively improve the robustness against translation errors. Additionally, our method for DSTC5 is purely machine learning based and requires no prior knowledge about the target language. We consider this a desirable property for building a tracker in the cross-language context, as not every developer will be familiar with both languages.

spoken language technology workshop | 2016

Recurrent convolutional neural networks for structured speech act tagging

Takashi Ushio; Hongjie Shi; Mitsuru Endo; Katsuyoshi Yamagami; Noriaki Horii

Spoken language understanding (SLU) is one of the important problem in natural language processing, and especially in dialog system. Fifth Dialog State Tracking Challenge (DSTC5) introduced a SLU challenge task, which is automatic tagging to speech utterances by two speaker roles with speech acts tag and semantic slots tag. In this paper, we focus on speech acts tagging. We propose local coactivate multi-task learning model for capturing structured speech acts, based on sentence features by recurrent convolutional neural networks. An experiment result, shows that our model outperformed all other submitted entries, and were able to capture coactivated local features of category and attribute, which are parts of speech act.

consumer communications and networking conference | 2004

Evaluation of a speech translation system for travel conversation installed in PDA

Kenji Mizutani; Tomohiro Konuma; Mitsuru Endo; Taro Nambu; Yumi Wakita

For mobile use, we are developing a multilingual example-sentence-driven speech translation system that has a multi-modal input interface to retrieve the sentence. In addition to the basic speech input mode, we apply an associative keyword mode with a software keyboard and an associative domain selection mode. The paper discusses the characteristics of each input mode, the synergistic effect obtained by combining the modes and the results of evaluations that show the difference between system performance in the laboratory and in the real world. As the evaluation criteria, we adopted the retrieval time and the retrieval precision of the sentence. When all of the modes were available, the precision within 30 seconds was 86.8% for a closed test set and 76.8% for an open-test set. When the retrieval was completed with only one operation, the average time was 10.3 seconds for a closed set. The precision was 12.0% higher than the maximum precision obtained when only one of the modes was available. The results show that synergetic effect of the combined modes certainly exists and all the modes are necessary to improve the systems usability.

Archive | 2003