Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ryu Takeda is active.

Publication


Featured researches published by Ryu Takeda.


international conference on robotics and automation | 2011

Design and implementation of selectable sound separation on the Texai telepresence system using HARK

Takeshi Mizumoto; Kazuhiro Nakadai; Takami Yoshida; Ryu Takeda; Takuma Otsuka; Toru Takahashi; Hiroshi G. Okuno

This paper presents the design and implementation of selectable sound separation functions on the telepresence system “Texai” using the robot audition software “HARK.” An operator of Texai can “walk” around a faraway office to attend a meeting or talk with people through video-conference instead of meeting in person. With a normal microphone, the operator has difficulty recognizing the auditory scene of the Texai, e.g., he/she cannot know the number and the locations of sounds. To solve this problem, we design selectable sound separation functions with 8 microphones in two modes, overview and filter modes, and implement them using HARKs sound source localization and separation. The overview mode visualizes the direction-of-arrival of surrounding sounds, while the filter mode provides sounds that originate from the range of directions he/she specifies. The functions enable the operator to be aware of a sound even if it comes from behind the Texai, and to concentrate on a particular sound. The design and implementation was completed in five days due to the portability of HARK. Experimental evaluations with actual and simulated data show that the resulting system localizes sound sources with a tolerance of 5 degrees.


intelligent robots and systems | 2008

A robot uses its own microphone to synchronize its steps to musical beats while scatting and singing

Kazumasa Murata; Kazuhiro Nakadai; Kazuyoshi Yoshii; Ryu Takeda; Toyotaka Torii; Hiroshi G. Okuno; Yuji Hasegawa; Hiroshi Tsujino

Musical beat tracking is one of the effective technologies for human-robot interaction such as musical sessions. Since such interaction should be performed in various environments in a natural way, musical beat tracking for a robot should cope with noise sources such as environmental noise, its own motor noises, and self voices, by using its own microphone. This paper addresses a musical beat tracking robot which can step, scat and sing according to musical beats by using its own microphone. To realize such a robot, we propose a robust beat tracking method by introducing two key techniques, that is, spectro-temporal pattern matching and echo cancellation. The former realizes robust tempo estimation with a shorter window length, thus, it can quickly adapt to tempo changes. The latter is effective to cancel self noises such as stepping, scatting, and singing. We implemented the proposed beat tracking method for Honda ASIMO. Experimental results showed ten times faster adaptation to tempo changes and high robustness in beat tracking for stepping, scatting and singing noises. We also demonstrated the robot times its steps while scatting or singing to musical beats.


ieee automatic speech recognition and understanding workshop | 2013

Elastic spectral distortion for low resource speech recognition with deep neural networks

Naoyuki Kanda; Ryu Takeda; Yasunari Obuchi

An acoustic model based on hidden Markov models with deep neural networks (DNN-HMM) has recently been proposed and achieved high recognition accuracy. In this paper, we investigated an elastic spectral distortion method to artificially augment training samples to help DNN-HMMs acquire enough robustness even when there are a limited number of training samples. We investigated three distortion methods - vocal tract length distortion, speech rate distortion, and frequency-axis random distortion - and evaluated those methods with Japanese lecture recordings. In a large vocabulary continuous speech recognition task with only 10 hours of training samples, a DNN-HMM trained with the elastic spectral distortion method achieved a 10.1% relative word error reduction compared with a normally trained DNN-HMM.


IEEE Transactions on Audio, Speech, and Language Processing | 2013

Optimized Speech Dereverberation From Probabilistic Perspective for Time Varying Acoustic Transfer Function

Masahito Togami; Yohei Kawaguchi; Ryu Takeda; Yasunari Obuchi; Nobuo Nukaga

A dereverberation technique has been developed that optimally combines multichannel inverse filtering (MIF), beamforming (BF), and non-linear reverberation suppression (NRS). It is robust against acoustic transfer function (ATF) fluctuations and creates less distortion than the NRS alone. The three components are optimally combined from a probabilistic perspective using a unified likelihood function incorporating two probabilistic models. A multichannel probabilistic source model based on a recently proposed local Gaussian model (LGM) provides robustness against ATF fluctuations of the early reflection. A probabilistic reverberant transfer function model (PRTFM) provides robustness against ATF fluctuations of the late reverberation. The MIF and multichannel under-determined source separation (MUSS) are optimized in an iterative manner. The MIF is designed to reduce the time-invariant part of the late reverberation by using optimal time-weighting with reference to the PRTFM and the LGM. The MUSS separates the dereverberated speech signal and the residual reverberation after the MIF, which can be interpreted as an optimized combination of the BF and the NRS. The parameters of the PRTFM and the LGM are optimized based on the MUSS output. Experimental results show that the proposed method is robust against the ATF fluctuations under both single and multiple source conditions.


intelligent robots and systems | 2007

Exploiting known sound source signals to improve ICA-based robot audition in speech separation and recognition

Ryu Takeda; Kazuhiro Nakadai; Kazunori Komatani; Tetsuya Ogata; Hiroshi G. Okuno

This paper describes a new semi-blind source separation (semi-BSS) technique with independent component analysis (ICA) for enhancing a target source of interest and for suppressing other known interference sources. The semi-BSS technique is necessary for double-talk free robot audition systems in order to utilize known sound source signals such as self speech, music, or TV-sound, through a line-in or ubiquitous network. Unlike the conventional semi-BSS with ICA, we use the time-frequency domain convolution model to describe the reflection of the sound and a new mixing process of sounds for ICA. In other words, we consider that reflected sounds during some delay time are different from the original. ICA then separates the reflections as other interference sources. The model enables us to eliminate the frame size limitations of the frequency-domain ICA, and ICA can separate the known sources under a highly reverberative environment. Experimental results show that our method outperformed the conventional semi-BSS using ICA under simulated normal and highly reverberative environments.


ieee-ras international conference on humanoid robots | 2008

A beat-tracking robot for human-robot interaction and its evaluation

Kazumasa Murata; Kazuhiro Nakadai; Ryu Takeda; Hiroshi G. Okuno; Toyotaka Torii; Yuji Hasegawa; Hiroshi Tsujino

Human-robot interaction through music in real environments is essential for humanoids, because such a robot makes people enjoyable. We thus developed a beat-tracking robot which steps, sings, and scats according to musical beats predicted by using a robot-embedded microphone, as a first step to realize a robot which makes a music session with people. This paper first describes the beat-tracking robot, and then evaluated it in detail at the following three points: adaptation to tempo changes, robustness of environmental noises including periodic noises generated by stepping, singing and scatting, and human-robot interaction by using a clapping sound. The results showed that our beat-tracking robot improved noise-robustness and adaptation to tempo changes drastically so that it can make a simple sound session with people.


Journal of Comparative Physiology A-neuroethology Sensory Neural and Behavioral Physiology | 2011

Sound imaging of nocturnal animal calls in their natural habitat

Takeshi Mizumoto; Ikkyu Aihara; Takuma Otsuka; Ryu Takeda; Kazuyuki Aihara; Hiroshi G. Okuno

We present a novel method for imaging acoustic communication between nocturnal animals. Investigating the spatio-temporal calling behavior of nocturnal animals, e.g., frogs and crickets, has been difficult because of the need to distinguish many animals’ calls in noisy environments without being able to see them. Our method visualizes the spatial and temporal dynamics using dozens of sound-to-light conversion devices (called “Firefly”) and an off-the-shelf video camera. The Firefly, which consists of a microphone and a light emitting diode, emits light when it captures nearby sound. Deploying dozens of Fireflies in a target area, we record calls of multiple individuals through the video camera. We conduct two experiments, one indoors and the other in the field, using Japanese tree frogs (Hyla japonica). The indoor experiment demonstrates that our method correctly visualizes Japanese tree frogs’ calling behavior. It has confirmed the known behavior; two frogs call synchronously or in anti-phase synchronization. The field experiment (in a rice paddy where Japanese tree frogs live) also visualizes the same calling behavior to confirm anti-phase synchronization in the field. Experimental results confirm that our method can visualize the calling behavior of nocturnal animals in their natural habitat.


intelligent robots and systems | 2008

A robot listens to music and counts its beats aloud by separating music from counting voice

Takeshi Mizumoto; Ryu Takeda; Kazuyoshi Yoshii; Kazunori Komatani; Tetsuya Ogata; Hiroshi G. Okuno

This paper presents a beat-counting robot that can count musical beats aloud, i.e., speak ldquoone, two, three, four, one, two, ...rdquo along music, while listening to music by using its own ears. Music-understanding robots that interact with humans should be able not only to recognize music internally, but also to express their own internal states. To develop our beat-counting robot, we have tackled three issues: (1) recognition of hierarchical beat structures, (2) expression of these structures by counting beats, and (3) suppression of counting voice (self-generated sound) in sound mixtures recorded by ears. The main issue is (3) because the interference of counting voice in music causes the decrease of the beat recognition accuracy. So we designed the architecture for music-understanding robot that is capable of dealing with the issue of self-generated sounds. To solve these issues, we took the following approaches: (1) beat structure prediction based on musical knowledge on chords and drums, (2) speed control of counting voice according to music tempo via a vocoder called STRAIGHT, and (3) semi-blind separation of sound mixtures into music and counting voice via an adaptive filter based on ICA (independent component analysis) that uses the waveform of the counting voice as a prior knowledge. Experimental result showed that suppressing robotpsilas own voice improved music recognition capability.


intelligent robots and systems | 2006

Missing-Feature based Speech Recognition for Two Simultaneous Speech Signals Separated by ICA with a pair of Humanoid Ears

Ryu Takeda; Shun’ichi Yamamoto; Kazunori Komatani; Tetsuya Ogata; Hiroshi G. Okuno

Robot audition is a critical technology in making robots symbiosis with people. Since we hear a mixture of sounds in our daily lives, sound source localization and separation, and recognition of separated sounds are three essential capabilities. Sound source localization has been recently studied well for robots, while the other capabilities still need extensive studies. This paper reports the robot audition system with a pair of omni-directional microphones embedded in a humanoid to recognize two simultaneous talkers. It first separates sound sources by independent component analysis (ICA) with single-input multiple-output (SIMO) model. Then, spectral distortion for separated sounds is estimated to identify reliable and unreliable components of the spectrogram. This estimation generates the missing feature masks as spectrographic masks. These masks are then used to avoid influences caused by spectral distortion in automatic speech recognition based on missing-feature method. The novel ideas of our system reside in estimates of spectral distortion of temporal-frequency domain in terms of feature vectors. In addition, we point out that the voice-activity detection (VAD) is effective to overcome the weak point of ICA against the changing number of talkers. The resulting system outperformed the baseline robot audition system by 15%


international conference on acoustics, speech, and signal processing | 2016

Sound source localization based on deep neural networks with directional activate function exploiting phase information

Ryu Takeda; Kazunori Komatani

This paper describes sound source localization (SSL) based on deep neural networks (DNNs) using discriminative training. A naïve DNNs for SSL can be configured as follows. Input is the frequency-domain feature used in other SSL methods, and the structure of DNNs is a fully-connected network using real numbers. The training fails because its network structure loses two important properties, i.e., the orthogonality of sub-bands and the intensity- and time-information saved in complex numbers. We solved these two problems by 1) integrating directional information at each sub-band hierarchically, and 2) designing a directional activator that could treat the complex numbers at each sub-band. Our experiments indicated that our method outperformed the naive DNN-based SSL by 20 points in terms of the block-level accuracy.

Collaboration


Dive into the Ryu Takeda's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge