Hirofumi Nakajima | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hirofumi Nakajima is active.

Explore More

Publication

Featured researches published by Hirofumi Nakajima.

Advanced Robotics | 2010

Design and Implementation of Robot Audition System 'HARK' — Open Source Software for Listening to Three Simultaneous Speakers

Kazuhiro Nakadai; Toru Takahashi; Hiroshi G. Okuno; Hirofumi Nakajima; Yuji Hasegawa; Hiroshi Tsujino

This paper presents the design and implementation of the HARK robot audition software system consisting of sound source localization modules, sound source separation modules and automatic speech recognition modules of separated speech signals that works on any robot with any microphone configuration. Since a robot with ears may be deployed to various auditory environments, the robot audition system should provide an easy way to adapt to them. HARK provides a set of modules to cope with various auditory environments by using an open-sourced middleware, FlowDesigner, and reduces the overheads of data transfer between modules. HARK has been open-sourced since April 2008. The resulting implementation of HARK with MUSIC-based sound source localization, GSS-based sound source separation and Missing Feature Theory-based automatic speech recognition on Honda ASIMO, SIG2 and Robovie R2 attains recognizing three simultaneous utterances with the delay of 1.9 s at the word correct rate of 80–90% for three speakers.

ieee-ras international conference on humanoid robots | 2008

An open source software system for robot audition HARK and its evaluation

Kazuhiro Nakadai; Hiroshi G. Okuno; Hirofumi Nakajima; Yuji Hasegawa; Hiroshi Tsujino

Robot capability of listening to several things at once by its own ears, that is, robot audition, is important in improving human-robot interaction. The critical issue in robot audition is real-time processing in noisy environments with high flexibility to support various kinds of robots and hardware configurations. This paper presents open-source robot audition software, called ldquoHARKrdquo, which includes sound source localization, separation, and automatic speech recognition (ASR). Since separated sounds suffer from spectral distortion due to separation, HARK generates a temporal-frequency map of reliability, called ldquomissing feature maskrdquo, for features of separated sounds. Then separated sounds are recognized by the Missing-Feature Theory (MFT) based ASR with missing feature masks. HARK is implemented on the middleware called ldquoFlowDesignerrdquo to share intermediate audio data, which provides real-time processing. HARKpsilas performance in recognition of noisy/simultaneous speech is shown by using three humanoid robots, Honda ASIMO, SIG2 and Robovie with different microphone layouts.

international conference on acoustics, speech, and signal processing | 2006

Robust Tracking of Multiple Sound Sources by Spatial Integration of Room And Robot Microphone Arrays

Kazuhiro Nakadai; Hirofumi Nakajima; Masamitsu Murase; Satoshi Kaijiri; Kentaro Yamada; Takahiro Nakamura; Yuji Hasegawa; Hiroshi G. Okuno; Hiroshi Tsujino

Sound source tracking is an important function for a robot operating in a daily environment, because the robot should recognize where a sound event such as speech, music and other environmental sounds originates from. This paper addresses sound source tracking by integrating a room and a robot microphone array. The room microphone array consists of 64 microphones attached to the walls. It provides 2D (x-y) sound source localization based on a weighted delay-and-sum beamforming method. The robot microphone array consists of eight microphones installed on a robot head, and localizes multiple sound sources in azimuth. The localization results are integrated to track sound sources by using a particle filter for multiple sound sources. The experimental results show that particle filter based integration reduces localization errors and provides accurate and robust 2D sound source tracking

international conference on acoustics, speech, and signal processing | 2008

Adaptive step-size parameter control for real-world blind source separation

Hirofumi Nakajima; Kazuhiro Nakadai; Yuji Hasegawa; Hiroshi Tsujino

This paper describes a method to adaptively control a step-size parameter which is used for updating a separation matrix to extract a target sound source accurately in blind source separation (BSS). The design of the step-size parameter is essential when we apply BSS to real-world applications such as robot audition systems, because the surrounding environment dynamically changes in the real world. It is common to use a fixed step-size parameter that is obtained empirically. However, due to environmental changes and noises, the performance of BSS with the fixed step-size parameter deteriorates and the separation matrix sometimes diverges. We propose a general method that allows adaptive step-size control. The proposed method is an extension of Newtons method utilizing a complex gradient theory and is applicable to any BSS algorithm. Actually, we applied it to six types of BSS algorithms for an 8ch microphone array embedded in Honda ASIMO. Experimental results show that the proposed method improves the performance of these six BSS algorithms through experiments of separation and recognition for two simultaneous speeches.

IEEE Transactions on Audio, Speech, and Language Processing | 2010

Blind Source Separation With Parameter-Free Adaptive Step-Size Method for Robot Audition

Hirofumi Nakajima; Kazuhiro Nakadai; Yuji Hasegawa; Hiroshi Tsujino

This paper proposes an adaptive step-size method for blind source separation (BSS) suitable for robot audition systems. The design of the step-size parameter is a critical consideration when we apply BSS to real-world applications such as robot audition systems, because the surrounding environment dynamically changes in the real world. It is common to use a fixed step-size parameter that was obtained empirically. However, because of environmental changes and noise, the performance of BSS with a fixed step-size parameter deteriorates and the separation matrix sometimes diverges. Several adaptive step-size methods for BSS have been proposed. However, there are difficulties when applying them to robot audition systems for example, low-computational cost requirements, being free from manual parameter adjustment and so on. We propose an adaptive step-size method suitable for robot audition systems. The proposed method has the following merits: 1) low computational cost; 2) no parameters to be adjusted manually; and 3) no additional preprocessing requirements. We applied our method to six different BSS algorithms for an eight-channel microphone array embedded in Hondas ASIMO robot. The method improved the performance of all six algorithms in experiments on separation and recognition of simultaneous speech. Moreover, the method increased the amount of calculation by less than 10% compared with the original calculation used in most BSS algorithms.

intelligent robots and systems | 2006

Real-Time Tracking of Multiple Sound Sources by Integration of In-Room and Robot-Embedded Microphone Arrays

Kazuhiro Nakadaij; Hirofumi Nakajima; Masamitsu Murase; Hiroshi G. Okuno; Yuji Hasegawa; Hiroshi Tsujino

Real-time and robust sound source tracking is an important function for a robot operating in a daily environment, because the robot should recognize where a sound event such as speech, music and other environmental sounds originate from. This paper addresses real-time sound source tracking by real-time integration of an in-room microphone array (IRMA) and a robot-embedded microphone array (REMA). The IRMA system consists of 64 ch microphones attached to the walls. It localizes multiple sound sources based on weighted delay-and-sum beam-forming on a 2D plane. The REMA system localizes multiple sound sources in azimuth using eight microphones attached to a robots head on a rotational table. The localization results are integrated to track multiple sound sources by using a particle filter in real-time. The experimental results show that particle filter based integration improved accuracy and robustness in multiple sound source tracking even when the robots head was in rotation

intelligent robots and systems | 2005

Sound source tracking with directivity pattern estimation using a 64 ch microphone array

Kazuhiro Nakadai; Hirofumi Nakajima; Kentaro Yamada; Yuji Hasegawa; Takahiro Nakamura; Hiroshi Tsujino

In human-robot communication, a robot should distinguish between voices uttered by a human and those played by a loudspeaker such as on a TV or a radio. This paper addresses detection of actual human voices by using a microphone array as an extension of auditory function of the robot to support environmental understanding by the robot. We introduce a 64 ch microphone array system in a room and propose a new method based on weighted delay-and-sum beamforming to estimate a directivity pattern of a sound source. The microphone array system localizes a sound source and estimates its directivity pattern. The directivity pattern estimation has two advantages as follows: One is that the system can detect whether the sound source is an actual human voice or not by comparing the estimated directivity pattern with prerecorded directivity patterns. The other is that the heading of the sound source is estimated by detecting the angle with the highest power in the directivity pattern. As a result, we proved the effectiveness of our microphone array through sound source tracking with orientation and detection of actual human voices based on directivity pattern estimation.

international conference on acoustics, speech, and signal processing | 2009

Sound source separation of moving speakers for robot audition

Kazuhiro Nakadai; Hirofumi Nakajima; Yuji Hasegawa; Hiroshi Tsujino

This paper addresses sound source separation and speech recognition for moving sound sources. Real-world applications such as robots should cope with both moving and stationary sound sources. However, most studies assume only stationary sound sources. We introduce two key techniques to cope with moving sources, that is, Adaptive Step-size control (AS) and Optima Controlled Recursive Average (OCRA) to improve blind source separation. We implemented a real-time robot audition system with these techniques for our humanoid robot ASIMO with an 8ch microphone array by using HARK which is our open-source software for robot audition. The performance of the system will be shown through sound source separation for moving sources and automatic speech recognition of separated speeches.

intelligent robots and systems | 2010

An easily-configurable robot audition system using Histogram-based Recursive Level Estimation

Hirofumi Nakajima; Gökhan Ince; Kazuhiro Nakadai; Yuji Hasegawa

This paper presents an easily-configurable robot audition system using the Histogram-based Recursive Level Estimation (HRLE) method. In order to achieve natural human-robot interaction, a robot should recognize human speeches even if there are some noises and reverberations. Since the precision of automatic speech recognizers (ASR) have been degraded by such interference, many systems applying speech enhancement processes have been reported. However, performance of most reported systems suffer from acoustical environmental changes. For example, an enhancement process optimized for steady-state noise, such as fan noise, yields low performance when the process is used for non-steady-state noises, such as background music. The primary reason is mismatches of parameters because the appropriate parameters change according to the acoustical environments. To solve this problem, we propose a robot audition system that optimizes parameters adaptively and automatically. Our system applies linear and non-linear enhancement sub-processes. For the linear sub-process, we used Geometric Source Separation with the Adaptive Step-size method (GSS-AS). This adjusts the parameters adaptively and does not have any manual parameters. For the non-linear sub-process, we applied a spectral subtraction-based enhancement method with the HRLE method that is newly introduced in this paper. Since HRLE controls the threshold level parameter implicitly based on the statistical characteristics of noise and speech levels, our system has high robustness against acoustical environmental changes. For robot audition systems, all processes should be performed in real-time. We also propose implementation techniques to make HRLE run in real-time and show the effectiveness. We evaluate performance of our system and compare it to conventional systems based on the Minima Controlled Recursive Average (MCRA) method and Minimum Mean Square Error (MMSE) method. The experimental results show that our system achieves better performance than the conventional systems.

international conference on signal processing | 2012

Robot audition for dynamic environments

Kazuhiro Nakadai; Gökhan Ince; Keisuke Nakamura; Hirofumi Nakajima

This paper addresses robot audition for dynamic environments, where speakers and/or a robot is moving within a dynamically-changing acoustic environment. Robot Audition studied so far assumed only stationary human-robot interaction scenes, and thus they have difficulties in coping with such dynamic environments. We recently developed new techniques for a robot to listen to several things simultaneously using its own ears even in dynamic environments; MUltiple SIgnal Classification based on Generalized Eigen-Value Decomposition (GEVD-MUSIC), Geometrically constrained High-order Decorrelation based Source Separation with Adaptive Step-size control (GHDSS-AS), Histogram-based Recursive Level Estimation (HRLE), and Template-based Ego Noise Suppression (TENS). GEVD-MUSIC provides noise-robust sound source localization. GHDSS-AS is a new sound source separation method which quickly adapts its sound source separation parameters to dynamic changes. HRLE is a practical post-filtering method with a small number of parameters. ENS estimates the motor noise of the robot by using templates recorded in advance and eliminates it. These methods are implemented as modules for our open-source robot audition software HARK to be easily integrated. We show that each of these methods and their combinations are effective to cope with dynamic environments through off-line experiments and on-line real-time demonstrations.

Explore More