Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jae Sam Yoon is active.

Publication


Featured researches published by Jae Sam Yoon.


IEEE Transactions on Consumer Electronics | 2008

A name recognition based call-and-come service for home robots

Yoo Rhee Oh; Jae Sam Yoon; Ji Hun Park; Mina Kim; Hong Kook Kim

In this paper, we propose and implement an efficient robot name registration and recognition system in order to provide a call-and-come service for home robots. The service is designed to enable a robot to come to a user when its name is called correctly by the user. Therefore, techniques such as voice detection, robust speech recognition and detection of the location of the use are required. For efficient robot name registration by voice, the proposed method first restricts the search space for name registration by using monophone-based acoustic models. Then, the registration of robot names is completed using triphone-based acoustic models in the restricted search space. In order to provide a reliable service, the parameters for the robot name verification are calculated to reduce the acceptance rate of false calls. In addition, acoustic models are adapted by using a distance speech database to improve the performance of distance speech recognition. Moreover, the location of the user is estimated by using a microphone array. The experimental results for the registration and recognition of robot names show that the average word error rate (WER) of speech recognition is 1.7% in home environments, which is an acceptable WER for a real call- and-come service.


IEICE Transactions on Information and Systems | 2008

HMM-Based Mask Estimation for a Speech Recognition Front-End Using Computational Auditory Scene Analysis

Ji Hun Park; Jae Sam Yoon; Hong Kook Kim

In this paper, we propose a new mask estimation method for the computational auditory scene analysis (CASA) of speech using two microphones. The proposed method is based on a hidden Markov model (HMM) in order to incorporate an observation that the mask information should be correlated over contiguous analysis frames. In other words, HMM is used to estimate the mask information represented as the interaural time difference (ITD) and the interaural level difference (ILD) of two channel signals, and the estimated mask information is finally employed in the separation of desired speech from noisy speech. To show the effectiveness of the proposed mask estimation, we then compare the performance of the proposed method with that of a Gaussian kernel-based estimation method in terms of the performance of speech recognition. As a result, the proposed HMM-based mask estimation method provided an average word error rate reduction of 69.14% when compared with the Gaussian kernel-based mask estimation method.


IEEE Transactions on Consumer Electronics | 2009

A voice-driven scene-mode recommendation service for portable digital imaging devices

Yoo Rhee Oh; Jae Sam Yoon; Hong Kook Kim; Myung Bo Kim; Sang Ryong Kim

In this paper, we propose a voice-driven scenemode recommendation service in order to more easily select scene-modes on portable digital imaging devices such as digital cameras and camcorders. In other words, the proposed service is designed to recommend or automatically change the scene-mode by recognizing a user?s voice command regarding scene or scene-related words. To realize such a service, we implement a system which is mainly composed of voice activity detection, automatic speech recognition (ASR), utterance verification, and word-to-scene-mode mapping. However, several optimization methods should be applied since portable digital imaging devices operate on embedded systems with limited resources. In addition, a speech adaptation database for acoustic models is developed such that the ASR system can adjust to the characteristics of the microphones and operating environments. Finally, the performance of the voice-driven scene-mode recommendation system is measured in terms of processing time and scenemode recognition accuracy (SMRA). It is shown from the experiments that the average processing time and the average SMRA are around 500 ms and 98.0% for 50 scene-related words, respectively, and 1200 ms and 96.8% for 200 scenerelated words.1


international conference on acoustics, speech, and signal processing | 2009

Acoustic model combination to compensate for residual noise in multi-channel source separation

Jae Sam Yoon; Ji Hun Park; Hong Kook Kim

In this paper, we propose an acoustic model combination technique for reducing a mismatch in a multi-channel noisy environment. To this end, we first apply a mask-based multi-channel source separation method, typically computational auditory scene analysis (CASA), to separate the speech source from noise. However, a certain degree of noise remains in the separated speech source, especially under low signal-to-noise ratio (SNR) conditions since the estimated mask is not ideal. Thus, the performance of automatic speech recognition (ASR) is limited. To improve ASR performance, the remaining noise can be further compensated in the acoustic model domain under a framework of parallel model combination. In particular, a noise model for PMC is estimated from the noise remained after application of the maskbased source separation, and SNR for PMC is also estimated based on the average of relative magnitude of mask along the utterance. It is shown from the experiments that the proposed acoustic model combination method relatively reduces the word error rate by 52.14% compared to mask-based source separation alone.


international conference on future generation communication and networking | 2010

Duration Model-Based Post-processing for the Performance Improvement of a Keyword Spotting System

Min Ji Lee; Jae Sam Yoon; Yoo Rhee Oh; Hong Kook Kim; Song Ha Choi; Ji Woon Kim; Myeong Bo Kim

In this paper, we propose a post-processing method based on a duration model to improve the performance of a keyword spotting system. The proposed duration model-based post-processing method is performed after detecting a keyword. To detect the keyword, we first combine a keyword model, a non-keyword model, and a silence model. Using the information on the detected keyword, the proposed post-processing method is then applied to determine whether or not the correct keyword is detected. To this end, we generate the duration model using Gaussian distribution in order to accommodate different duration characteristics of each phoneme. Comparing the performance of the proposed method with those of conventional anti-keyword scoring methods, it is shown that the false acceptance and the false rejection rates are reduced.


IEEE Journal of Selected Topics in Signal Processing | 2010

Acoustic Model Combination Incorporated With Mask-Based Multi-Channel Source Separation for Automatic Speech Recognition

Jae Sam Yoon; Ji Hun Park; Hong Kook Kim; Hoirin Kim

In this paper, we propose an acoustic model combination (AMC) technique for reducing a mismatch between training and testing conditions of an automatic speech recognition (ASR) system in a multi-channel noisy environment. In our previous work, we proposed a hidden Markov model (HMM)-based mask estimation method for multi-channel source separation using two microphones, where HMMs were adopted for mask estimation in order to incorporate an observation that the mask information should be correlated over contiguous analysis frames. However, it was observed that a certain degree of noise still remained in the separated speech source especially under low signal-to-noise ratio (SNR) conditions. This was because the estimated mask was not ideal, which resulted in limiting the improvement of ASR performance. To mitigate this problem, the remaining noise can be further compensated in the acoustic model domain under a framework of parallel model combination (PMC). In particular, a noise model and a weighting factor for the proposed AMC can be estimated from the remaining noise and the average of the relative magnitude of the mask, respectively. It is shown from the experiments that an ASR system employing the proposed AMC technique achieves a relative average word error rate (WER) reduction of 56.91%, when compared to a system using the mask-based source separation alone. In addition, compared to a conventional PMC implemented with a log-normal approximation, the proposed AMC relatively reduces WER by 43.64%.


Archive | 2007

Acoustic model adaptation methods based on pronunciation variability analysis for enhancing the recognition of voice of non-native speaker and apparatus thereof

Hong Kook Kim; Yoo Rhee Oh; Jae Sam Yoon


Archive | 2006

Apparatus and method for extracting noise-robust speech recognition vector by sharing preprocessing step used in speech coding

Chang-Sun Ryu; Jae-In Kim; Hong Kook Kim; Jae Sam Yoon; Yoo Rhee Oh


conference of the international speech communication association | 2010

SNR-based mask compensation for computational auditory scene analysis applied to speech recognition in a car environment.

Ji Hun Park; Seon Man Kim; Jae Sam Yoon; Hong Kook Kim; Sung Joo Lee; Yunkeun Lee


conference of the international speech communication association | 2008

Mask estimation incorporating time-frequency trajectories for a CASA-based ASR front-end.

Ji Hun Park; Jae Sam Yoon; Hong Kook Kim

Collaboration


Dive into the Jae Sam Yoon's collaboration.

Top Co-Authors

Avatar

Hong Kook Kim

Gwangju Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Ji Hun Park

Gwangju Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Yoo Rhee Oh

Gwangju Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Min Ji Lee

Gwangju Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Mina Kim

Gwangju Institute of Science and Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge