Is this you? Create Your Porfile

Hideki Kashioka

National Institute of Information and Communications Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hideki Kashioka is active.

Explore More

Publication

Featured researches published by Hideki Kashioka.

IEICE Transactions on Information and Systems | 2006

Non-Audible Murmur (NAM) Recognition

Yoshitaka Nakajima; Hideki Kashioka; Nick Campbell; Kiyohiro Shikano

We propose a new practical input interface for the recognition of Non-Audible Murmur (NAM), which is defined as articulated respiratory sound without vocal-fold vibration transmitted through the soft tissues of the head. We developed a microphone attachment, which adheres to the skin, by applying the principle of a medical stethoscope, found the ideal position for sampling flesh-conducted NAM sound vibration and retrained an acoustic model with NAM samples. Then using the Julius Japanese Dictation Toolkit, we tested the feasibility of using this method in place of an external microphone for analyzing air-conducted voice sound.

international conference on acoustics, speech, and signal processing | 2003

Non-audible murmur recognition input interface using stethoscopic microphone attached to the skin

Yoshitaka Nakajima; Hideki Kashioka; Kiyohiro Shikano; Nick Campbell

We propose a new style of practical input interface for the recognition of non-audible murmur (NAM), i.e., for the recognition of inaudible speech produced without vibration of the vocal folds. We have developed a microphone attachment, which adheres to the skin, applying the principle of a medical stethoscope, found the ideal position for sampling flesh-conducted NAM sound vibration and retrained an acoustic model with NAM samples. Then, using the Julius Japanese Dictation Toolkit, we tested the possibilities for practical use of this method in place of an external microphone for analyzing air-conducted voice sound.

international conference on acoustics, speech, and signal processing | 2009

Statistical dialog management applied to WFST-based dialog systems

Chiori Hori; Kiyonori Ohtake; Teruhisa Misu; Hideki Kashioka; Satoshi Nakamura

We have proposed an expandable dialog scenario description and platform to manage dialog systems using a weighted finite-state transducer (WFST) in which user concept and system action tags are input and output of the transducer, respectively. In this paper, we apply this framework to statistical dialog management in which a dialog strategy is acquired from a corpus of human-to-human conversation for hotel reservation. A scenario WFST for dialog management was automatically created from an N-gram model of a tag sequence that was annotated in the corpus with Interchange Format (IF). Additionally, a word-to-concept WFST for spoken language understanding (SLU) was obtained from the same corpus. The acquired scenario WFST and SLU WFST were composed together and then optimized. We evaluated the proposed WFST-based statistic dialog management in terms of correctness to detect the next system actions and have confirmed the automatically acquired dialog scenario from a corpus can manage dialog reasonably on the WFST-based dialog management platform.

Advanced Robotics | 2011

Learning, Generation and Recognition of Motions by Reference-Point-Dependent Probabilistic Models

Komei Sugiura; Naoto Iwahashi; Hideki Kashioka; Satoshi Nakamura

This paper presents a novel method for learning object manipulation such as rotating an object or placing one object on another. In this method, motions are learned using reference-point-dependent probabilistic models, which can be used for the generation and recognition of motions. The method estimates (i) the reference point, (ii) the intrinsic coordinate system type, which is the type of coordinate system intrinsic to a motion, and (iii) the probabilistic model parameters of the motion that is considered in the intrinsic coordinate system. Motion trajectories are modeled by a hidden Markov model (HMM), and an HMM-based method using static and dynamic features is used for trajectory generation. The method was evaluated in physical experiments in terms of motion generation and recognition. In the experiments, users demonstrated the manipulation of puppets and toys so that the motions could be learned. A recognition accuracy of 90% was obtained for a test set of motions performed by three subjects. Furthermore, the results showed that appropriate motions were generated even if the object placement was changed.

ACM Transactions on Speech and Language Processing | 2011

Modeling spoken decision support dialogue and optimization of its dialogue strategy

Teruhisa Misu; Komei Sugiura; Tatsuya Kawahara; Kiyonori Ohtake; Chiori Hori; Hideki Kashioka; Hisashi Kawai; Satoshi Nakamura

This article presents a user model for user simulation and a system state representation in spoken decision support dialogue systems. When selecting from a group of alternatives, users apply different decision-making criteria with different priorities. At the beginning of the dialogue, however, users often do not have a definite goal or criteria in which they place value, thus they can learn about new features while interacting with the system and accordingly create new criteria. In this article, we present a user model and dialogue state representation that accommodate these patterns by considering the users knowledge and preferences. To estimate the parameters used in the user model, we implemented a trial sightseeing guidance system, collected dialogue data, and trained a user simulator. Since the user parameters are not observable from the system, the dialogue is modeled as a partially observable Markov decision process (POMDP), and a dialogue state representation was introduced based on the model. We then optimized its dialogue strategy so that users can make better choices. The dialogue strategy is evaluated using a user simulator trained from a large number of dialogues collected using a trial dialogue system.

international conference on acoustics, speech, and signal processing | 2012

A comparison of dynamic WFST decoding approaches

Paul R. Dixon; Chiori Hori; Hideki Kashioka

In this paper we perform a comparison of lookahead composition and on-the-fly hypothesis rescoring using a common decoder. The results on a large vocabulary speech recognition task illustrate the differences in the behaviour of these algorithms in terms of error rate, real time factor, memory usage and internal statistics of the decoder. The evaluations were performed when the decoder was operated at either the state or arc level. The results show the dynamic approaches also work well at the state level even though there is greater dynamic construction cost.

mobile data management | 2013

Multilingual Speech-to-Speech Translation System: VoiceTra

Shigeki Matsuda; Xinhui Hu; Yoshinori Shiga; Hideki Kashioka; Chiori Hori; Keiji Yasuda; Hideo Okuma; Masao Uchiyama; Eiichiro Sumita; Hisashi Kawai; Satoshi Nakamura

This study presents an overview of VoiceTra, which was developed by NICT and released as the worlds first network-based multilingual speech-to-speech translation system for smartphones, and describes in detail its multilingual speech recognition, its multilingual translation, and its multilingual speech synthesis in regards to field experiments. We show the effects of system updates using the data collected from field experiments to improve our acoustic and language models.

intelligent robots and systems | 2010

Active learning of confidence measure function in robot language acquisition framework

Komei Sugiura; Naoto Iwahashi; Hideki Kashioka; Satoshi Nakamura

In an object manipulation dialogue, a robot may misunderstand an ambiguous command from a user, such as “Place the cup down (on the table),” potentially resulting in an accident. Although making confirmation questions before all motion will decrease the risk of this failure, the user will find it more convenient if confirmation questions are not made under trivial situations. This paper proposes a method for estimating ambiguity in the commands by introducing an active learning framework with Bayesian logistic regression to human-robot spoken dialogue. We conducted physical experiments in which a user and a manipulator-based robot communicated in spoken language to manipulate toys.

Proceedings of the 7th Workshop on Asian Language Resources | 2009

Annotating Dialogue Acts to Construct Dialogue Systems for Consulting

Kiyonori Ohtake; Teruhisa Misu; Chiori Hori; Hideki Kashioka; Satoshi Nakamura

This paper introduces a new corpus of consulting dialogues, which is designed for training a dialogue manager that can handle consulting dialogues through spontaneous interactions from the tagged dialogue corpus. We have collected 130 h of consulting dialogues in the tourist guidance domain. This paper outlines our taxonomy of dialogue act annotation that can describe two aspects of an utterances: the communicative function (speech act), and the semantic content of the utterance. We provide an overview of the Kyoto tour guide dialogue corpus and a preliminary analysis using the dialogue act tags.

Automatic Speech Recognition & Understanding, 2009. ASRU 2009. IEEE Workshop on | 2009

Weighted Finite State Transducer Based Statistical Dialog Management

Chiori Hori; Kiyonori Ohtake; Teruhisa Misu; Hideki Kashioka; Satoshi Nakamura

We proposed a dialog system using a weighted finite-state transducer (WFST) in which user concept and system action tags are input and output of the transducer, respectively. The WFST-based platform for dialog management enables us to combine various statistical models for dialog management (DM), user input understanding and system action generation, and then search the best system action in response to user inputs among multiple hypotheses. To test the potential of the WFST-based DM platform using statistical models, we constructed a dialog system using a human-to-human spoken dialog corpus for hotel reservation, which is annotated with Interchange Format (IF). A scenario WFST and a spoken language understanding (SLU) WFST were obtained from the corpus and then composed together and optimized. We evaluated the detection accuracy of the system next action tags using Mean Reciprocal Ranking (MRR). Finally, we constructed a full WFST-based dialog system by composing SLU, scenario and sentence generation (SG) WFSTs. Humans read the system responses in natural language and judged the quality of the responses. We confirmed that the WFST-based DM platform was capable of handling various spoken language and scenarios when the user concept and system action tags are consistent and distinguishable.

Explore More