Ryosuke Isotani | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ryosuke Isotani is active.

Explore More

Publication

Featured researches published by Ryosuke Isotani.

international conference on acoustics, speech, and signal processing | 1989

Speaker-independent word recognition using dynamic programming neural networks

Hiroaki Sakoe; Ryosuke Isotani; Kazunaga Yoshida; Ken-ichi Iso; Takao Watanabe

A description is given of speaker-independent word recognition based on a new neural network model called the dynamic programming neural network (DNN), which can treat time-sequence patterns. DNN is based on the integration of a multilayer neural network and dynamic-programming-based matching. Speaker-independent isolated Japanese digit recognition experiments were carried out using data uttered by 107 speakers (50 speakers for training and 57 speakers for testing). The recognition accuracy was 99.3%, suggesting that the model can be effective for speech recognition.<<ETX>>

international conference on acoustics, speech, and signal processing | 2004

Speech-activated text retrieval system for multimodal cellular phones

Shinya Ishikawa; Takahiro Ikeda; Kiyokazu Miki; Fumihiro Adachi; Ryosuke Isotani; Ken-ichi Iso; Akitoshi Okumura

The paper describes an on-line manual page retrieval system activated by spoken queries for multimodal cellular phones. The system recognizes a users naturally spoken queries by telephone LVCSR and searches an on-line manual with a retrieval module on a server. The user can view the retrieved data on the screen of the phone via Web access. The LVCSR module consists of a telephone acoustic model and an n-gram language model derived from a task query corpus. The adaptation method using the target manual is also presented. The retrieval module utilizes pairs of words with dependency relations and also distinguishes affirmative and negative expressions to improve precision. The proposed system gives 82.6% keyword recognition accuracy and 77.5% task achievement rate. The field trial of the system is now underway.

international conference on multimodal interfaces | 2002

An automatic speech translation system on PDAs for travel conversation

Ryosuke Isotani; Kiyoshi Yamabana; Shinichi Ando; Ken Hanazawa; Shinya Ishikawa; Tadashi Emori; Ken-ichi Iso; Hiroaki Hattori; Akitoshi Okumura; Takao Watanabe

We present an automatic speech-to-speech translation system for personal digital assistants (PDAs) that helps oral communication between Japanese and English speakers in various situations while traveling. Our original compact large vocabulary continuous speech recognition engine, compact translation engine based on a lexicalized grammar, and compact Japanese speech synthesis engine lead to the development of a Japanese/English bi-directional speech translation system that works with limited computational resources.

international conference on acoustics, speech, and signal processing | 2006

Model-Basedwiener Filter for Noise Robust Speech Recognition

Takayuki Arakawa; Masanori Tsujikawa; Ryosuke Isotani

In this paper, we propose a new approach for noise robust speech recognition, which integrates signal-processing-based spectral enhancement and statistical-model-based compensation. The proposed method, model-based Wiener filter (MBW), takes three steps to estimate clean speech signals from noisy speech signals, which are corrupted by various kinds of additive background noise. The first step is the well-known spectral subtraction (SS). Since the SS averagely subtracts noise components, the estimated speech signals often include distortion. In the second step, the distortion caused by SS is reduced using the minimum mean square error estimation for a Gaussian mixture model representing pre-trained knowledge of speech. In the final step, the Wiener filtering is performed with the decision-directed method. Experiments are conducted using the Aurora2-J (Japanese digit string) database. The results show that the proposed method performs as well as the ETSI advanced front-end in average and the variation range of the recognition accuracy according to the kind of noise is about one third, which demonstrates the robustness of the proposed method

meeting of the association for computational linguistics | 2003

A Speech Translation System with Mobile Wireless Clients

Kiyoshi Yamabana; Ken Hanazawa; Ryosuke Isotani; Seiya Osada; Akitoshi Okumura; Takao Watanabe

We developed a client-server speech translation system with mobile wireless clients. The system performs speech translation between English and Japanese of travel conversation and helps foreign language communication in an area where wireless LAN connection is available.

spoken language technology workshop | 2014

Efficient multi-lingual unsupervised acoustic model training under mismatch conditions

Masahiro Saiko; Hitoshi Yamamoto; Ryosuke Isotani; Chiori Hori

We propose a new multi-lingual unsupervised acoustic model (AM) training method for low-resourced languages under mismatch conditions. In those languages, there is very limited or no transcribed speech. Thus, unsupervised acoustic modeling using AMs of different languages (not low-resourced languages) has been proposed. The conventional method has shown to be effective for similar acoustic conditions, such as speaking-style, between a low-resourced language and different languages. However, since it is not easy to prepare the matched AMs of different languages, mismatch problem between each AM and the speech of a low-resourced language for unsupervised acoustic modeling is practically occurred. In this paper, we deal with this mismatch problem. To generate more accurate automatic transcriptions under mismatch conditions, we introduce two things: (1) Initial AMs were trained with speech of different languages that was mapped to the phonemes of a low-resourced language and (2) Iterative process to switch back and forth between training of AMs and adaptation of the initial AMs. The proposed method without any transcriptions achieved a word error rate of 32.1% on the evaluation set of IWSLT2011, while the word error rates of the conventional method and the supervised training method were 39.3 and 22.7%, respectively.

Nec Research & Development | 2003