Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yoshiaki Bando is active.

Publication


Featured researches published by Yoshiaki Bando.


intelligent robots and systems | 2013

Posture estimation of hose-shaped robot using microphone array localization

Yoshiaki Bando; Takeshi Mizumoto; Katsutoshi Itoyama; Kazuhiro Nakadai; Hiroshi G. Okuno

This paper presents a posture estimation of hose-shaped robot using microphone array localization. The hose-shaped robots, one of major rescue robots, have problems with navigation because their posture is too flexible for a remote operator to control to go as far as desired. For navigational and mission usability, the posture estimation of the hose-shaped robot is essential. We developed a posture estimation method with a microphone array and small loudspeakers equipped on the hose-shaped robot. Our method consists of two steps: (1) playing a known sound from the loudspeaker one-by-one, and (2) estimating the microphone positions on the hose-shaped robot instead of estimating the posture directly. We designed a time difference of arrival (TDOA) estimation method to be robust against directional noise and implemented a prototype system using a posture model of the hose-shaped robot and an Extended Kalman Filter (EKF). The validity of our approach is evaluated by the experiments with both signals recorded in an anechoic chamber and simulated data.


international symposium on safety, security, and rescue robotics | 2015

Human-voice enhancement based on online RPCA for a hose-shaped rescue robot with a microphone array

Yoshiaki Bando; Katsutoshi Itoyama; Masashi Konyo; Satoshi Tadokoro; Kazuhiro Nakadai; Kazuyoshi Yoshii; Hiroshi G. Okuno

This paper presents an online real-time method that enhances human voices included in severely noisy audio signals captured by microphones of a hose-shaped rescue robot. To help a remote operator of such a robot pick up a weak voice of a human buried under rubble, it is crucial to suppress the loud ego-noise caused by the movements of the robot in real time. We tackle this task by using online robust principal component analysis (ORPCA) for decomposing the spectrogram of an observed noisy signal into the sum of low-rank and sparse spectrograms that are expected to correspond to periodic ego-noise and human voices. Using a microphone array distributed on the long body of a hose-shaped robot, ego-noise suppression can be further improved by combining the results of ORPCA applied to the observed signal captured by each microphone. Experiments using a 3-m hose-shaped rescue robot with eight microphones show that the proposed method improves the performance of conventional ego-noise suppression using only one microphone by 7.4 dB in SDR and 17.2 in SIR.


intelligent robots and systems | 2015

Audio-visual beat tracking based on a state-space model for a music robot dancing with humans

Misato Ohkita; Yoshiaki Bando; Yukara Ikemiya; Katsutoshi Itoyama; Kazuyoshi Yoshii

This paper presents an audio-visual beat-tracking method for an entertainment robot that can dance in synchronization with music and human dancers. Conventional music robots have focused on either music audio signals or dancing movements of humans for detecting and predicting beat times in real time. Since a robot needs to record music audio signals by using its own microphones, however, the signals are severely contaminated with loud environmental noise and reverberant sounds. Moreover, it is difficult to visually detect beat times from real complicated dancing movements that exhibit weaker repetitive characteristics than music audio signals do. To solve these problems, we propose a state-space model that integrates both audio and visual information in a probabilistic manner. At each frame, the method extracts acoustic features (audio tempos and onset likelihoods) from music audio signals and extracts skeleton features from movements of a human dancer. The current tempo and the next beat time are then estimated from those observed features by using a particle filter. Experimental results showed that the proposed multi-modal method using a depth sensor (Kinect) for extracting skeleton features outperformed conventional mono-modal methods by 0.20 (F measure) in terms of beat-tracking accuracy in a noisy and reverberant environment.


Journal of robotics and mechatronics | 2017

Low latency and high quality two-stage human-voice-enhancement system for a hose-shaped rescue robot

Yoshiaki Bando; Hiroshi Saruwatari; Nobutaka Ono; Shoji Makino; Katsutoshi Itoyama; Daichi Kitamura; Masaru Ishimura; Moe Takakusaki; Narumi Mae; Kouei Yamaoka; Yutaro Matsui; Yuichi Ambe; Masashi Konyo; Satoshi Tadokoro; Kazuyoshi Yoshii; Hiroshi G. Okuno

2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan ∗4Graduate School of Systems and Information Engineering, Tsukuba University 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8577, Japan ∗5Department of Informatics, School of Multidisciplinary Sciences, SOKENDAI 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan ∗6Graduate School of Information Science, Tohoku University 6-6-01 Aramaki Aza Aoba, Aoba-ku, Sendai 980-8579, Japan ∗7Graduate Program for Embodiment Informatics, Waseda University 2-4-12 Okubo, Shinjuku, Tokyo 169-0072, Japan


international workshop on acoustic signal enhancement | 2016

Student's t multichannel nonnegative matrix factorization for blind source separation

Koichi Kitamura; Yoshiaki Bando; Katsutoshi Itoyama; Kazuyoshi Yoshii

This paper presents a robust generalization of multichannel nonnegative matrix factorization (MNMF) for blind source separation of mixture audio signals recorded by a microphone array. In conventional MNMF, the complex spectra of observed mixture signals are assumed to be complex Gaussian distributed and are decomposed into the product of the power spectra, temporal activations, and spatial correlation matrices of individual sources in such a way that the complex Gaussian likelihood is maximized. Since the mixture spectra usually include outliers, we propose MNMF based on the complex Students t likelihood, called t-MNMF, including the original MNMF as a special case. The parameters of t-MNMF can be iteratively optimized with an efficient multiplicative updating algorithm. Experiments showed that t-MNMF with a certain range of degrees of freedom tends to be insensitive to parameter initialization and outperform conventional MNMF.


Sensors | 2017

Design of UAV-Embedded Microphone Array System for Sound Source Localization in Outdoor Environments

Kotaro Hoshiba; Kai Washizaki; Mizuho Wakabayashi; Takahiro Ishiki; Makoto Kumon; Yoshiaki Bando; Daniel Gabriel; Kazuhiro Nakadai; Hiroshi G. Okuno

In search and rescue activities, unmanned aerial vehicles (UAV) should exploit sound information to compensate for poor visual information. This paper describes the design and implementation of a UAV-embedded microphone array system for sound source localization in outdoor environments. Four critical development problems included water-resistance of the microphone array, efficiency in assembling, reliability of wireless communication, and sufficiency of visualization tools for operators. To solve these problems, we developed a spherical microphone array system (SMAS) consisting of a microphone array, a stable wireless network communication system, and intuitive visualization tools. The performance of SMAS was evaluated with simulated data and a demonstration in the field. Results confirmed that the SMAS provides highly accurate localization, water resistance, prompt assembly, stable wireless communication, and intuitive information for observers and operators.


european signal processing conference | 2016

Variational Bayesian multi-channel robust NMF for human-voice enhancement with a deformable and partially-occluded microphone array

Yoshiaki Bando; Katsutoshi Itoyama; Masashi Konyo; Satoshi Tadokoro; Kazuhiro Nakadai; Kazuyoshi Yoshii; Hiroshi G. Okuno

This paper presents a human-voice enhancement method for a deformable and partially-occluded microphone array. Although microphone arrays distributed on the long bodies of hose-shaped rescue robots are crucial for finding victims under collapsed buildings, human voices captured by a microphone array are contaminated by non-stationary actuator and friction noise. Standard blind source separation methods cannot be used because the relative microphone positions change over time and some of them are occasionally shaded by rubble. To solve these problems, we develop a Bayesian model that separates multichannel amplitude spectrograms into sparse and low-rank components (human voice and noise) without using phase information, which depends on the array layout. The voice level at each microphone is estimated in a time-varying manner for reducing the influence of the shaded microphones. Experiments using a 3-m hose-shaped robot with eight microphones show that our method outperforms conventional methods by the signal-to-noise ratio of 2.7 dB.


international conference on acoustics, speech, and signal processing | 2015

Challenges in deploying a microphone array to localize and separate sound sources in real auditory scenes

Yoshiaki Bando; Takuma Otsuka; Katsutoshi Itoyama; Kazuyoshi Yoshii; Yoko Sasaki; Satoshi Kagami; Hiroshi G. Okuno

Analyzing the auditory scene of real environments is challenging partly because an unknown number and type of sound sources are observed at the same time and partly because these sounds are observed on a significantly different sound pressure level at the microphone. These are difficult problems even with state-of-the-art sound source localization and separation methods. In this paper, we exploit two such methods using a microphone array: (1) Bayesian nonparametric microphone array processing (BNP-MAP), which is capable of separating and localizing sound sources when the number of sound sources is unspecified, and (2) robot audition software “HARK” is capable of separating and localizing in real time. Through experimentation, we found that BNP-MAP is more robust against differences in the sound pressure levels of the source signals and in the spatial closeness of source positions. Experiments analyzing real scenes of human conversations recorded in a big exhibition hall and bird calling recorded at a natural park demonstrate the efficacy and applicability of BNP-MAP.


intelligent robots and systems | 2015

Optimizing the layout of multiple mobile robots for cooperative sound source separation

Kouhei Sekiguchi; Yoshiaki Bando; Katsutoshi Itoyama; Kazuyoshi Yoshii

This paper presents a novel active audition method that enables multiple mobile robots to move to optimal positions for improving the performance of sound source separation. A main advantage of our distributed system is that each robot has its own microphone array and all mobile robots can collaborate on source separation by regarding a set of movable microphone arrays as a big reconfigurable array. To incrementally optimize the positions of the robots (the layout of the big microphone array) in an active-audition manner, it is necessary to predict the source separation performance from a possible layout of the next time step although true source signals are unknown. To solve this problem, our method simulates delay-and-sum beamforming from a possible layout for theoretically calculating the gain for each frequency component of a source signal in the corresponding separated signal. The robots are moved into a layout with the highest average gain over all sources and the whole frequency range. The experimental results showed that the harmonic mean of signal-to-distortion ratios (SDRs) was improved by 6.0 dB in simulations and by 5.7 dB in a real environment.


intelligent robots and systems | 2015

Microphone-accelerometer based 3D posture estimation for a hose-shaped rescue robot

Yoshiaki Bando; Katsutoshi Itoyama; Masashi Konyo; Satoshi Tadokoro; Kazuhiro Nakadai; Kazuyoshi Yoshii; Hiroshi G. Okuno

3D posture estimation for a hose-shaped robot is critical in rescue activities due to complex physical environments. Conventional sound-based posture estimation assumes rather flat physical environments and focuses only on 2D, resulting in poor performance in real world environments with rubble. This paper presents novel 3D posture estimation by exploiting microphones and accelerometers. The idea of our method is to compensate the lack of posture information obtained by sound-based time-difference-of arrival (TDOA) with the tilt information obtained from accelerometers. This compensation is formulated as a nonlinear state-space model and solved by the unscented Kalman filter. Experiments are conducted by using a 3m hose-shaped robot with eight units of a microphone and an accelerometer and seven units of a loudspeaker and a vibration motor deployed in a simple 3D structure. Experimental results demonstrate that our method reduces the errors of initial states to about 20 cm in the 3D space. If the initial errors of initial states are less than 20 %, our method can estimate the correct 3D posture in real-time.

Collaboration


Dive into the Yoshiaki Bando's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge