Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Takuma Otsuka is active.

Publication


Featured researches published by Takuma Otsuka.


international conference on robotics and automation | 2011

Design and implementation of selectable sound separation on the Texai telepresence system using HARK

Takeshi Mizumoto; Kazuhiro Nakadai; Takami Yoshida; Ryu Takeda; Takuma Otsuka; Toru Takahashi; Hiroshi G. Okuno

This paper presents the design and implementation of selectable sound separation functions on the telepresence system “Texai” using the robot audition software “HARK.” An operator of Texai can “walk” around a faraway office to attend a meeting or talk with people through video-conference instead of meeting in person. With a normal microphone, the operator has difficulty recognizing the auditory scene of the Texai, e.g., he/she cannot know the number and the locations of sounds. To solve this problem, we design selectable sound separation functions with 8 microphones in two modes, overview and filter modes, and implement them using HARKs sound source localization and separation. The overview mode visualizes the direction-of-arrival of surrounding sounds, while the filter mode provides sounds that originate from the range of directions he/she specifies. The functions enable the operator to be aware of a sound even if it comes from behind the Texai, and to concentrate on a particular sound. The design and implementation was completed in five days due to the portability of HARK. Experimental evaluations with actual and simulated data show that the resulting system localizes sound sources with a tolerance of 5 degrees.


IEEE Transactions on Audio, Speech, and Language Processing | 2014

Bayesian Nonparametrics for Microphone Array Processing

Takuma Otsuka; Katsuhiko Ishiguro; Hiroshi Sawada; Hiroshi G. Okuno

Sound source localization and separation from a mixture of sounds are essential functions for computational auditory scene analysis. The main challenges are designing a unified framework for joint optimization and estimating the sound sources under auditory uncertainties such as reverberation or unknown number of sounds. Since sound source localization and separation are mutually dependent, their simultaneous estimation is required for better and more robust performance. A unified model is presented for sound source localization and separation based on Bayesian nonparametrics. Experiments using simulated and recorded audio mixtures show that a method based on this model achieves state-of-the-art sound source separation quality and has more robust performance on the source number estimation under reverberant environments.


intelligent robots and systems | 2013

Noise correlation matrix estimation for improving sound source localization by multirotor UAV

Koutarou Furukawa; Keita Okutani; Kohei Nagira; Takuma Otsuka; Katsutoshi Itoyama; Kazuhiro Nakadai; Hiroshi G. Okuno

A method has been developed for improving sound source localization (SSL) using a microphone array from an unmanned aerial vehicle with multiple rotors, a “multirotor UAV”. One of the main problems in SSL from a multirotor UAV is that the ego noise of the rotors on the UAV interferes with the audio observation and degrades the SSL performance. We employ a generalized eigenvalue decomposition-based multiple signal classification (GEVD-MUSIC) algorithm to reduce the effect of ego noise. While GEVD-MUSIC algorithm requires a noise correlation matrix corresponding to the auto-correlation of the multichannel observation of the rotor noise, the noise correlation is nonstationary due to the aerodynamic control of the UAV. Therefore, we need an adaptive estimation method of the noise correlation matrix for a robust SSL using GEVD-MUSIC algorithm. Our method uses a Gaussian process regression to estimate the noise correlation matrix in each time period from the measurements of self-monitoring sensors attached to the UAV such as the pitch-roll-yaw tilt angles, xyz speeds, and motor control values. Experiments compare our method with existing SSL methods in terms of precision and recall rates of SSL. The results demonstrate that our method outperforms existing methods, especially under high signal-to-noise-ratio conditions.


Scientific Reports | 2015

Spatio-Temporal Dynamics in Collective Frog Choruses Examined by Mathematical Modeling and Field Observations

Ikkyu Aihara; Takeshi Mizumoto; Takuma Otsuka; Hiromitsu Awano; Kohei Nagira; Hiroshi G. Okuno; Kazuyuki Aihara

This paper reports theoretical and experimental studies on spatio-temporal dynamics in the choruses of male Japanese tree frogs. First, we theoretically model their calling times and positions as a system of coupled mobile oscillators. Numerical simulation of the model as well as calculation of the order parameters show that the spatio-temporal dynamics exhibits bistability between two-cluster antisynchronization and wavy antisynchronization, by assuming that the frogs are attracted to the edge of a simple circular breeding site. Second, we change the shape of the breeding site from the circle to rectangles including a straight line, and evaluate the stability of two-cluster and wavy antisynchronization. Numerical simulation shows that two-cluster antisynchronization is more frequently observed than wavy antisynchronization. Finally, we recorded frog choruses at an actual paddy field using our sound-imaging method. Analysis of the video demonstrated a consistent result with the aforementioned simulation: namely, two-cluster antisynchronization was more frequently realized.


intelligent robots and systems | 2010

Human-robot ensemble between robot thereminist and human percussionist using coupled oscillator model

Takeshi Mizumoto; Takuma Otsuka; Kazuhiro Nakadai; Toru Takahashi; Kazunori Komatani; Tetsuya Ogata; Hiroshi G. Okuno

This paper presents a novel synchronizing method for a human-robot ensemble using coupled oscillators. We define an ensemble as a synchronized performance produced through interactions between independent players. To attain better synchronized performance, the robot should predict the humans behavior to reduce the difference between the humans and robots onset timings. Existing studies in such synchronization only adapts to onset intervals, thus, need a considerable time to synchronize. We use a coupled oscillator model to predict the humans behavior. Experimental results show that our method reduces the average of onset time errors; when we use a metronome, a tempo-varying metronome or a human drummer, errors are reduced by 38%, 10% or 14% on the average, respectively. These results mean that the prediction of humans behaviors is effective for the synchronized performance.


Journal of Comparative Physiology A-neuroethology Sensory Neural and Behavioral Physiology | 2011

Sound imaging of nocturnal animal calls in their natural habitat

Takeshi Mizumoto; Ikkyu Aihara; Takuma Otsuka; Ryu Takeda; Kazuyuki Aihara; Hiroshi G. Okuno

We present a novel method for imaging acoustic communication between nocturnal animals. Investigating the spatio-temporal calling behavior of nocturnal animals, e.g., frogs and crickets, has been difficult because of the need to distinguish many animals’ calls in noisy environments without being able to see them. Our method visualizes the spatial and temporal dynamics using dozens of sound-to-light conversion devices (called “Firefly”) and an off-the-shelf video camera. The Firefly, which consists of a microphone and a light emitting diode, emits light when it captures nearby sound. Deploying dozens of Fireflies in a target area, we record calls of multiple individuals through the video camera. We conduct two experiments, one indoors and the other in the field, using Japanese tree frogs (Hyla japonica). The indoor experiment demonstrates that our method correctly visualizes Japanese tree frogs’ calling behavior. It has confirmed the known behavior; two frogs call synchronously or in anti-phase synchronization. The field experiment (in a rice paddy where Japanese tree frogs live) also visualizes the same calling behavior to confirm anti-phase synchronization in the field. Experimental results confirm that our method can visualize the calling behavior of nocturnal animals in their natural habitat.


intelligent robots and systems | 2009

Incremental polyphonic audio to score alignment using beat tracking for singer robots

Takuma Otsuka; Toru Takahashi; Hiroshi G. Okuno; Kazunori Komatani; Tetsuya Ogata; Kazumasa Murata; Kazuhiro Nakadai

We aim at developing a singer robot capable of listening to music with its own “ears” and interacting with a humans musical performance. Such a singer robot requires at least three functions: listening to the music, understanding what position in the music is being performed, and generating a singing voice. In this paper, we focus on the second function, that is, the capability to align an audio signal to its musical score represented symbolically. Issues underlying the score alignment problem are: (1) diversity in the sounds of various musical instruments, (2) difference between the audio signal and the musical score, (3) fluctuation in tempo of the musical performance. Our solutions to these issues are as follows: (1) the design of features based on a chroma vector in the 12-tone model and onset of the sound, (2) defining the rareness for each tone based on the idea that scarcely used tone is salient in the audio signal, and (3) the use of a switching Kalman filter for robust tempo estimation. The experimental result shows that our score alignment method improves the average of cumulative absolute errors in score alignment by 29% using 100 popular music tunes compared to the beat tracking without score alignment.


international workshop on security | 2013

Solving Google's Continuous Audio CAPTCHA with HMM-Based Automatic Speech Recognition

Shotaro Sano; Takuma Otsuka; Hiroshi G. Okuno

CAPTCHAs play critical roles in maintaining the security of various Web services by distinguishing humans from automated programs and preventing Web services from being abused. CAPTCHAs are designed to block automated programs by presenting questions that are easy for humans but difficult for computers, e.g., recognition of visual digits or audio utterances. Recent audio CAPTCHAs, such as Google’s audio reCAPTCHA, have presented overlapping and distorted target voices with stationary background noise. We investigate the security of overlapping audio CAPTCHAs by developing an audio reCAPTCHA solver. Our solver is constructed based on speech recognition techniques using hidden Markov models (HMMs). It is implemented by using an off-the-shelf library HMM Toolkit. Our experiments revealed vulnerabilities in the current version of audio reCAPTCHA with the solver cracking 52% of the questions. We further explain that background stationary noise did not contribute to enhance security against our solver.


IEEE Transactions on Audio, Speech, and Language Processing | 2014

Multichannel sound source dereverberation and separation for arbitrary number of sources based on Bayesian nonparametrics

Takuma Otsuka; Katsuhiko Ishiguro; Takuya Yoshioka; Hiroshi Sawada; Hiroshi G. Okuno

Multichannel signal processing using a microphone array provides fundamental functions for coping with multi-source situations, such as sound source localization and separation, that are needed to extract the auditory information for each source. Auditory uncertainties about the degree of reverberation and the number of sources are known to degrade performance or limit the practical application of microphone array processing. Such uncertainties must therefore be overcome to realize general and robust microphone array processing. These uncertainty issues have been partly addressed-existing methods focus on either source number uncertainty or the reverberation issue, where joint separation and dereverberation has been achieved only for the overdetermined conditions. This paper presents an all-round method that achieves source separation and dereverberation for an arbitrary number of sources including underdetermined conditions. Our method uses Bayesian nonparametrics that realize an infinitely extensible modeling flexibility so as to bypass the model selection in the separation and dereverberation problem, which is caused by the source number uncertainty. Evaluation using a dereverberation and separation task with various numbers of sources including underdetermined conditions demonstrates that (1) our method is applicable to the separation and dereverberation of underdetermined mixtures, and that (2) the source extraction performance is comparable to that of a state-of-the-art method suitable only for overdetermined conditions.


intelligent robots and systems | 2011

Particle-filter based audio-visual beat-tracking for music robot ensemble with human guitarist

Tatsuhiko Itohara; Takuma Otsuka; Takeshi Mizumoto; Tetsuya Ogata; Hiroshi G. Okuno

This paper presents an audio-visual beat-tracking method for ensemble robots with a human guitarist. Beat-tracking, or estimation of tempo and beat times of music, is critical to the high quality of musical ensemble performance. Since a human plays the guitar in out-beat in back beat and syncopation, the main problems of beat-tracking of a humans guitar playing are twofold: tempo changes and varying note lengths. Most conventional methods have not addressed humans guitar playing. Therefore, they lack the adaptation of either of the problems. To solve the problems simultaneously, our method uses not only audio but visual features. We extract audio features with Spectro-Temporal Pattern Matching (STPM) and visual features with optical flow, mean shift and Hough transform. Our beat-tracking estimates tempo and beat time using a particle filter; both acoustic feature of guitar sounds and visual features of arm motions are represented as particles. The particle is determined based on prior distribution of audio and visual features, respectively Experimental results confirm that our integrated audio-visual approach is robust against tempo changes and varying note lengths. In addition, they also show that estimation convergence rate depends only a little on the number of particles. The real-time factor is 0.88 when the number of particles is 200, and this shows out method works in real-time.

Collaboration


Dive into the Takuma Otsuka's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hiroshi Sawada

Nippon Telegraph and Telephone

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Katsuhiko Ishiguro

Nippon Telegraph and Telephone

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge