Ryuichi Nisimura | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ryuichi Nisimura is active.

Explore More

Publication

Featured researches published by Ryuichi Nisimura.

international conference on acoustics, speech, and signal processing | 2008

Tandem-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0, and aperiodicity estimation

Hideki Kawahara; Masanori Morise; Toru Takahashi; Ryuichi Nisimura; Toshio Irino; Hideki Banno

A simple new method for estimating temporally stable power spectra is introduced to provide a unified basis for computing an interference-free spectrum, the fundamental frequency (F0), as well as aperiodicity estimation. F0 adaptive spectral smoothing and cepstral liftering based on consistent sampling theory are employed for interference-free spectral estimation. A perturbation spectrum, calculated from temporally stable power and interference-free spectra, provides the basis for both F0 and aperiodicity estimation. The proposed approach eliminates ad-hoc parameter tuning and the heavy demand on computational power, from which STRAIGHT has suffered in the past.

international conference on acoustics, speech, and signal processing | 2009

Temporally variable multi-aspect auditory morphing enabling extrapolation without objective and perceptual breakdown

Hideki Kawahara; Ryuichi Nisimura; Toshio Irino; Masanori Morise; Toru Takahashi; Hideki Banno

A generalized framework of auditory morphing based on the speech analysis, modification and resynthesis system STRAIGHT is proposed that enables each morphing rate of representational aspects to be a function of time, including the temporal axis itself. Two types of algorithms were derived: an incremental algorithm for real-time manipulation of morphing rates and a batch processing algorithm for off-line post-production applications. By defining morphing in terms of the derivative of mapping functions in the logarithmic domain, breakdown of morphing resynthesis found in the previous formulation in the case of extrapolations was eliminated. A method to alleviate perceptual defects in extrapolation is also introduced.

intelligent robots and systems | 2006

Humanoid with Interaction Ability Using Vision and Speech Information

Junichi Ido; Yoshio Matsumoto; Tsukasa Ogasawara; Ryuichi Nisimura

Intelligent robots will make a chance for us to use a computer in our daily life. We implemented a humanoid robot for the computerized university guidance at first and then some capabilities for the natural interaction are added. This paper describes the hardware and software system of this humanoid with interaction ability. HRP-2 sitting opposite to a user across a table can detect gaze direction, head pose and gestures using a stereo camera system attached to the head. In addition, our system can recognize the users question utterance and non-stationary noise such as coughing and sneezing using microphones. Using efficiently such information, HRP-2 can answer the question by its synthesized voice with gestures and do tasks such as passing objects on the table

intelligent robots and systems | 2002

ASKA: receptionist robot with speech dialogue system

Ryuichi Nisimura; Takashi Uchida; Akinobu Lee; Hiroshi Saruwatari; Kiyohiro Shikano; Yoshio Matsumoto

We implemented a humanoid robot, ASKA, in our university reception desk for the computerized university guidance. ASKA can recognize a users question utterance, and answer the users question by its text-to-speech voice, hand gesture and head movement. This paper describes the speech related parts of ASKA. ASKA can deal with a wide task domain of 20k large vocabulary using a word trigram model and an elaborated speaker-independent acoustic model. ASKA can also make a response with keyword and key-phrase detection in the N-best speech recognition results. The word recognition rate for the reception task is 90.9%, and the rate for the out-of-domain task is 78.9%. The correct response rate for the reception task is 61.7%. Users can enjoy their question-answering with ASKA.

IEICE Transactions on Information and Systems | 2008

Development, Long-Term Operation and Portability of a Real-Environment Speech-Oriented Guidance System

Tobias Cincarek; Hiromichi Kawanami; Ryuichi Nisimura; Akinobu Lee; Hiroshi Saruwatari; Kiyohiro Shikano

In this paper, the development, long-term operation and portability of a practical ASR application in a real environment is investigated. The target application is a speech-oriented guidance system installed at the local community center. The system has been exposed to ordinary people since November 2002. More than 300 hours or more than 700,000 inputs have been collected during four years. The outcome is a rare example of a large scale real-environment speech database. A simulation experiment is carried out with this database to investigate how the systems performance improves during the first two years of operation. The purpose is to determine empirically the amount of real-environment data which has to be prepared to build a system with reasonable speech recognition performance and response accuracy. Furthermore, the relative importance of developing the main system components, i.e. speech recognizer and the response generation module, is assessed. Although depending on the systems modeling capacities and domain complexity, experimental results show that overall performance stagnates after employing about 10-15 k utterances for training the acoustic model, 40–50 k utterances for training the language model and 40 k–50 k utterances for compiling the question and answer database. The Q&A database was most important for improving the systems response accuracy. Finally, the portability of the well-trained first system prototype for a different environment, a local subway station, is investigated. Since collection and preparation of large amounts of real data is impractical in general, only one month of data from the new environment is employed for system adaptation. While the speech recognition component of the first prototype has a high degree of portability, the response accuracy is lower than in the first environment. The main reason is a domain difference between the two systems, since they are installed in different environments. This implicates that it is imperative to take the behavior of users under real conditions into account to build a system with high user satisfaction.

spoken language technology workshop | 2008

Speech-to-text input method for web system using JavaScript

Ryuichi Nisimura; Jumpei Miyake; Hideki Kawahara; Toshio Irino

We have developed a speech-to-text input method for web systems. The system is provided as a JavaScript library including an Ajax-like mechanism based on a Java applet, CGI programs, and dynamic HTML documents. It allows users to access voice-enabled web pages without requiring special browsers. Web developers can embed it on their web page by inserting only one line in the header field of an HTML document. This study also aims at observing natural spoken interactions in personal environments. We have succeeded in collecting 4,003 inputs during a period of seven months via our public Japanese ASR server. In order to cover out-of-vocabulary words to cope with some proper nouns, a web page to register new words into the language model are developed. As a result, we could obtain an improvement of 0.8% in the recognition accuracy. With regard to the acoustical conditions, an SNR of 25.3 dB was observed.

international conference on acoustics, speech, and signal processing | 2013

Higher order waveform symmetry measure and its application to periodicity detectors for speech and singing with fine temporal resolution

Hideki Kawahara; Masanori Morise; Ryuichi Nisimura; Toshio Irino

Another simple and high-speed F0 extractor with high temporal resolution based on our previous proposal has been developed by adding a higher-order symmetry measure. This extension made the proposed method significantly more robust than the previous one. The proposed method is a detector of the lowest prominent sinusoidal component. It can use several F0 refinement procedures when the signal is the sum of harmonic sinusoidal components. The refinement procedure presented here is based on a stable representation of instantaneous frequency of periodic signals. The whole procedure implemented by Matlab runs faster than realtime on usual PCs for 44,100 Hz sampled sounds. Application of the proposed algorithm revealed that rapid temporal modulations in both F0 trajectory and spectral envelope exist typically in expressive voices such as those those used in lively singing performance.

international conference on human interface and management of information | 2014

Development of a Mobile Application for Crowdsourcing the Data Collection of Environmental Sounds

Minori Matsuyama; Ryuichi Nisimura; Hideki Kawahara; Junnosuke Yamada; Toshio Irino

Our study introduces a mobile navigation system enabling a sound input interface. To realize high-performance environmental sound recognition system using Android devices, we organized a database of environmental sounds collected in our daily lives. Crowdsourcing is a useful approach for organizing a database based on collaborative works of people. We recruited trial users to test our system via a web-based crowdsourcing service provider in Japan. However, we found that improvement of the system is important for maintaining the motivation of users in order to continue the collection of sounds. We believe that the improved user interface (UI) design introduced to facilitate the annotation task. This paper describes an overview of our system, focusing on a method for utilizing the crowdsourcing approach using Android devices, and its UI design. We developed a touch panel UI for the annotation task by selecting an appropriate class of a sound source.

Advances in Experimental Medicine and Biology | 2013

Accurate Estimation of Compression in Simultaneous Masking Enables the Simulation of Hearing Impairment for Normal-Hearing Listeners

Toshio Irino; Tomofumi Fukawatase; Makoto Sakaguchi; Ryuichi Nisimura; Hideki Kawahara; Roy D. Patterson

This chapter presents a unified gammachirp framework for -estimating cochlear compression and synthesizing sounds with inverse compression that -cancels the compression of a normal-hearing (NH) listener to simulate the -experience of a hearing-impaired (HI) listener. The compressive gammachirp (cGC) filter was -fitted to notched-noise masking data to derive level-dependent -filter shapes and the cochlear compression function (e.g., Patterson et al., J Acoust Soc Am 114:1529-1542, 2003). The procedure is based on the analysis/synthesis technique of Irino and Patterson (IEEE Trans Audio Speech Lang Process 14:2222-2232, 2006) using a dynamic cGC filterbank (dcGC-FB). The level dependency of the dcGC-FB can be reversed to produce inverse compression and resynthesize sounds in a form that cancels the compression applied by the -auditory system of the NH listener. The chapter shows that the estimation of compression in simultaneous masking is improved if the notched-noise procedure for the derivation of auditory filter shape includes noise bands with different levels. Since both the estimation and resynthesis are performed within the gammachirp framework, it is possible for a specific NH listener to experience the loss of a -specific HI listener.

international conference on human computer interaction | 2011

Development of web-based voice interface to identify child users based on automatic speech recognition system

Ryuichi Nisimura; Shoko Miyamori; Lisa Kurihara; Hideki Kawahara; Toshio Irino

We propose a method to identify child speakers, which can be adopted in Web filtering systems to protect children from the dangers of the Internet. The proposed child identification method was developed relies on an automatic speech recognition (ASR) algorithm, that uses an acoustic hidden Markov model (HMM) and a support vector machine (SVM). To extend the proposed method for use in a Web application, we used our voice-enabled Web system (the w3voice system) as a front-end interface for a prototype system. In this paper, we present an overview of the prototype system to elucidate our proposal. We also evaluate the efficacy of the proposed method in identifying child speakers by using voices captured from real Web users.

Explore More