Hyun Don Kim

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hyun Don Kim is active.

Explore More

Publication

Featured researches published by Hyun Don Kim.

intelligent robots and systems | 2007

Auditory and visual integration based localization and tracking of humans in daily-life environments

Hyun Don Kim; Kazunori Komatani; Tetsuya Ogata; Hiroshi G. Okuno

The purpose of this research is to develop techniques that enable robots to choose and track a desired person for interaction in daily-life environments. Therefore, localizing multiple moving sounds and human faces is necessary so that robots can locate a desired person. For sound source localization, we used a cross-power spectrum phase analysis (CSP) method and showed that CSP can localize sound sources only using two microphones and does not need impulse response data. An expectation-maximization (EM) algorithm was shown to enable a robot to cope with multiple moving sound sources. For face localization, we developed a method that can reliably detect several faces using the skin color classification obtained by using the EM algorithm. To deal with a change in color state according to illumination condition and various skin colors, the robot can obtain new skin color features of faces detected by OpenCV, an open vision library, for detecting human faces. Finally, we developed a probability based method to integrate auditory and visual information and to produce a reliable tracking path in real time. Furthermore, the developed system chose and tracked people while dealing with various background noises that are considered loud, even in the daily-life environments.

intelligent robots and systems | 2008

Target speech detection and separation for humanoid robots in sparse dialogue with noisy home environments

Hyun Don Kim; Jinsung Kim; Kazunori Komatani; Tetsuya Ogata; Hiroshi G. Okuno

In normal human communication, people face the speaker when listening and usually pay attention to the speakerpsila face. Therefore, in robot audition, the recognition of the front talker is critical for smooth interactions. This paper presents an enhanced speech detection method for a humanoid robot that can separate and recognize speech signals originating from the front even in noisy home environments. The robot audition system consists of a new type of voice activity detection (VAD) based on the complex spectrum circle centroid (CSCC) method and a maximum signal-to-noise (max-SNR) beamformer. This VAD based on CSCC can classify speech signals that are retrieved at the frontal region of two microphones embedded on the robot. The system works in real-time without needing training filter coefficients given in advance even in a noisy environment (SNR>0 dB). It can cope with speech noise generated from televisions and audio devices that does not originate from the center. Experiments using a humanoid robot, SIG2, with two microphones showed that our system enhanced extracted target speech signals more than 12 dB (SNR) and the success rate of automatic speech recognition for Japanese words was increased about 17 points.

14th International Symposium of Robotic Research, ISRR 2009 | 2011

Robot audition: Missing feature theory approach and active audition

Hiroshi G. Okuno; Kazuhiro Nakadai; Hyun Don Kim

Robot capability of listening to several things at once by its own ears, that is,robot audition, is important in improving interaction and symbiosis between humans and robots. The critical issue in robot audition is real-time processing and robustness against noisy environments with high flexibility to support various kinds of robots and hardware configurations. This paper presents two important aspects of robot audition; Missing-Feature-Theory (MFT) approach and active audition. HARK open-source robot audition incorporates MFT approach to recognize speech signals that are localized and separated from a mixture of sound captured by 8- channel microphone array. HARK is ported to four robots, Honda ASIMO, SIG2, Robovie-R2 and HRP-2, with different microphone configurations and recognizes three simultaneous utterances with 1.9 sec latency. In binaural hearing, the most famous problem is a front-back confusion of sound sources. Active binaural robot audition implemented on SIG2 disambiguates the problem well by rotating its head with pitting. This active audition improves the localization for the periphery.

Advanced Robotics | 2009

Human tracking system integrating sound and face localization using an expectation-maximization algorithm in real environments

Hyun Don Kim; Kazunori Komatani; Tetsuya Ogata; Hiroshi G. Okuno

We have developed a human tracking system for use by robots that integrate sound and face localization. Conventional systems usually require many microphones and/or prior information to localize several sound sources. Moreover, they are incapable of coping with various types of background noise. Our system, the cross-power spectrum phase analysis of sound signals obtained with only two microphones, is used to localize the sound source without having to use prior information such as impulse response data. An expectation-maximization (EM) algorithm is used to help the system cope with several moving sound sources. The problem of distinguishing whether sounds are coming from the front or back is also solved with only two microphones by rotating the robots head. A developed method that uses facial skin colors classified by another EM algorithm enables the system to detect faces in various poses. It can compensate for the error in the sound localization for a speaker and also identify noise signals entering from undesired directions by detecting a human face. A developed probability-based method is used to integrate the auditory and visual information in order to produce a reliable tracking path in real-time. Experiments using a robot showed that our system can localize two sounds at the same time and track a communication partner while dealing with various types of background noise.

intelligent robots and systems | 2008

Design and evaluation of two-channel-based sound source localization over entire azimuth range for moving talkers

Hyun Don Kim; Kazunori Komatani; Tetsuya Ogata; Hiroshi G. Okuno

We propose a way to evaluate various sound localization systems for moving sounds under the same conditions. To construct a database for moving sounds, we developed a moving sound creation tool using the API library developed by the ARINIS Company. We developed a two-channel-based sound source localization system integrated with a cross-power spectrum phase (CSP) analysis and EM algorithm. The CSP of sound signals obtained with only two microphones is used to localize the sound source without having to use prior information such as impulse response data. The EM algorithm helps the system cope with several moving sound sources and reduce localization error. We evaluated our sound localization method using artificial moving sounds and confirmed that it can well localize moving sounds slower than 1.125 rad/sec. Finally, we solve the problem of distinguishing whether sounds are coming from the front or back by rotating a robotpsilas head equipped with only two microphones. Our system was applied to a humanoid robot called SIG2, and we confirmed its ability to localize sounds over the entire azimuth range.

international conference on robotics and automation | 2008

Two-channel-based voice activity detection for humanoid robots in noisy home environments

Hyun Don Kim; Kazunori Komatani; Tetsuya Ogata; Hiroshi G. Okuno

The purpose of this research is to accurately classify the speech signals originating from the front even in noisy home environments. This ability can help robots to improve speech recognition and to spot keywords. We therefore developed a new voice activity detection (VAD) based on the complex spectrum circle centroid (CSCC) method. It can classify the speech signals that are received at the front of two microphones by comparing the spectral energy of observed signals with that of target signals estimated by CSCC. Also, it can work in real time without training filter coefficients beforehand even in noisy environments (SNR > 0 dB) and can cope with speech noises generated by audio-visual equipments such as televisions and audio devices. Since the CSCC method requires the directions of the noise signals, we also developed a sound source localization system integrated with cross-power spectrum phase (CSP) analysis and an expectation-maximization (EM) algorithm. This system was demonstrated to enable a robot to cope with multiple sound sources using two microphones.

international conference industrial engineering other applications applied intelligent systems | 2007

Real-time auditory and visual talker tracking through integrating EM algorithm and particle filter

Hyun Don Kim; Kazunori Komatani; Tetsuya Ogata; Hiroshi G. Okuno

This paper presents techniques that enable a talker tracking for effective human-robot interaction. We propose new way of integrating an EM algorithm and a particle filter to select an appropriate path for tracking the talker. It can easily adapt to new kinds of information for tracking the talker with our system. This is because our system estimates the position of the desired talker through means, variances, and weights calculated from EM training regardless of the numbers or kinds of information. In addition, to enhance a robots ability to track a talker in real-world environments, we applied the particle filter to talker tracking after executing the EM algorithm. We also integrated a variety of auditory and visual information regarding sound localization, face localization, and the detection of lip movement. Moreover, we applied a sound classification function that allows our system to distinguish between voice, music, or noise. We also developed a vision module that can locate moving objects.

Applied Bionics and Biomechanics | 2009

Binaural active audition for humanoid robots to localise speech over entire azimuth range

Hyun Don Kim; Kazunori Komatani; Tetsuya Ogata; Hiroshi G. Okuno

We applied motion theory to robot audition to improve the inadequate performance. Motions are critical for overcoming the ambiguity and sparseness of information obtained by two microphones. To realise this, we first designed a sound source localisation system integrated with cross-power spectrum phase CSP analysis and an EM algorithm. The CSP of sound signals obtained with only two microphones was used to localise the sound source without having to measure impulse response data. The expectation-maximisation EM algorithm helped the system to cope with several moving sound sources and reduce localisation errors. We then proposed a way of constructing a database for moving sounds to evaluate binaural sound source localisation. We evaluated our sound localisation method using artificial moving sounds and confirmed that it could effectively localise moving sounds slower than 1.125 rad/s. Consequently, we solved the problem of distinguishing whether sounds were coming from the front or rear by rotating and/or tipping the robots head that was equipped with only two microphones. Our system was applied to a humanoid robot called SIG2, and we confirmed its ability to localise sounds over the entire azimuth range as the success rates for sound localisation in the front and rear areas were 97.6% and 75.6% respectively.

robot and human interactive communication | 2007

Auditory and Visual Integration based Localization and Tracking of Multiple Moving Sounds in Daily-life Environments

Hyun Don Kim; Kazunori Komatani; Tetsuya Ogata; Hiroshi G. Okuno

This paper presents techniques that enable talker tracking for effective human-robot interaction. To track moving people in daily-life environments, localizing multiple moving sounds is necessary so that robots can locate talkers. However, the conventional method requires an array of microphones and impulse response data. Therefore, we propose a way to integrate a cross-power spectrum phase analysis (CSP) method and an expectation-maximization (EM) algorithm. The CSP can localize sound sources using only two microphones and does not need impulse response data. Moreover, the EM algorithm increases the systems effectiveness and allows it to cope with multiple sound sources. We confirmed that the proposed method performs better than the conventional method. In addition, we added a particle filter to the tracking process to produce a reliable tracking path and the particle filter is able to integrate audio-visual information effectively. Furthermore, the applied particle filter is able to track people while dealing with various noises that are even loud sounds in the daily-life environments.

Advanced Robotics | 2009

Target speech detection and separation for communication with humanoid robots in noisy home environments

Hyun Don Kim; Jinsung Kim; Kazunori Komatani; Tetsuya Ogata; Hiroshi G. Okuno

People usually talk face to face when they communicate with their partner. Therefore, in robot audition, the recognition of the front talker is critical for smooth interactions. This paper presents an enhanced speech detection method for a humanoid robot that can separate and recognize speech signals originating from the front even in noisy home environments. The robot audition system consists of a new type of voice activity detection (VAD) based on the complex spectrum circle centroid (CSCC) method and a maximum signal-to-noise ratio (SNR) beamformer. This VAD based on CSCC can classify speech signals that are retrieved at the frontal region of two microphones embedded on the robot. The system works in real-time without needing training filter coefficients given in advance even in a noisy environment (SNR > 0 dB). It can cope with speech noise generated from televisions and audio devices that does not originate from the center. Experiments using a humanoid robot, SIG2, with two microphones showed that our system enhanced extracted target speech signals more than 12 dB (SNR) and the success rate of automatic speech recognition for Japanese words was increased by about 17 points.

Explore More

Collaboration

Dive into the Hyun Don Kim's collaboration.

Top Co-Authors

Hiroshi G. Okuno

Waseda University

View shared research outputs

Top Co-Authors

Kazunori Komatani

Osaka University

View shared research outputs

Top Co-Authors

Tetsuya Ogata

Waseda University

View shared research outputs

Top Co-Authors

Jinsung Kim

Korea Institute of Science and Technology

View shared research outputs

Explore More

Kyoto University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot

Dive into the research topics where Hyun Don Kim is active.

Publication

Featured researches published by Hyun Don Kim.

Auditory and visual integration based localization and tracking of humans in daily-life environments

Target speech detection and separation for humanoid robots in sparse dialogue with noisy home environments

Robot audition: Missing feature theory approach and active audition

Human tracking system integrating sound and face localization using an expectation-maximization algorithm in real environments

Design and evaluation of two-channel-based sound source localization over entire azimuth range for moving talkers

Two-channel-based voice activity detection for humanoid robots in noisy home environments

Real-time auditory and visual talker tracking through integrating EM algorithm and particle filter

Binaural active audition for humanoid robots to localise speech over entire azimuth range

Auditory and Visual Integration based Localization and Tracking of Multiple Moving Sounds in Daily-life Environments

Target speech detection and separation for communication with humanoid robots in noisy home environments

Collaboration

Dive into the Hyun Don Kim's collaboration.