Dongsuk Yook | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dongsuk Yook is active.

Explore More

Publication

Featured researches published by Dongsuk Yook.

IEEE Transactions on Consumer Electronics | 2008

Automatic sound recognition for the hearing impaired

In Chul Yoo; Dongsuk Yook

We present a wearable sound recognition system to assist the hearing impaired. Traditionally, hearing aid dogs are specially trained to facilitate the daily life of the hearing impaired. However, since training hearing aid dogs is costly and time-consuming, it would be desirable to substitute them with an automatic sound recognition system using speech recognition technologies. As the sound recognition system will be used in home environments where background noises and reverberations are high, conventional speech recognition techniques are not directly applicable, since their performance drops off rapidly in these environments. In this paper, we introduce a new sound recognition algorithm which is optimized for mechanical sounds such as doorbells. The new algorithm uses a new distance measure called the normalized peak domination ratio (NPDR) that is based on the characteristic spectral peaks of these sounds. The proposed algorithm showed a sound recognition accuracy of 99.7%, and noise rejection accuracy of 99.7%.

IEEE Transactions on Consumer Electronics | 2009

Sound source localization for robot auditory systems

Youngkyu Cho; Dongsuk Yook; Sukmoon Chang; Hyunsoo Kim

Sound source localization (SSL) is a major function of robot auditory systems for intelligent home robots. The steered response power-phase transform (SRP-PHAT) is a widely used method for robust SSL. However, it is too slow to run in real time, since SRP-PHAT searches a large number of candidate sound source locations. This paper proposes a search space clustering method designed to speed up the SRP-PHAT based sound source localization algorithm for intelligent home robots equipped with small scale microphone arrays. The proposed method reduces the number of candidate sound source locations by 30.6% and achieves 46.7% error reduction compared to conventional methods.

IEEE Transactions on Consumer Electronics | 2009

A voice trigger system using keyword and speaker recognition for mobile devices

Hyeopwoo Lee; Sukmoon Chang; Dongsuk Yook; Yong-Serk Kim

Voice activity detection plays an important role for an efficient voice interface between human and mobile devices, since it can be used as a trigger to activate an automatic speech recognition module of a mobile device. If the input speech signal can be recognized as a predefined magic word coming from a legitimate user, it can be utilized as a trigger. In this paper, we propose a voice trigger system using a keyword-dependent speaker recognition technique. The voice trigger must be able to perform keyword recognition, as well as speaker recognition, without using computationally demanding speech recognizers to properly trigger a mobile device with low computational power consumption. We propose a template based method and a hidden Markov model (HMM) based method for the voice trigger to solve this problem. The experiments using a Korean word corpus show that the template based method performed 4.1 times faster than the HMM based method. However, the HMM based method reduced the recognition error by 27.8% relatively compared to the template based method. The proposed methods are complementary and can be used selectively depending on the device of interest.

pacific rim international conference on artificial intelligence | 2002

Audio-to-Visual Conversion Using Hidden Markov Models

Soonkyu Lee; Dongsuk Yook

We describe audio-to-visual conversion techniques for efficient multimedia communications. The audio signals are automatically converted to visual images of mouth shape. The visual speech can be represented as a sequence of visemes, which are the generic face images corresponding to particular sounds. Visual images synchronized with audio signals can provide user-friendly interface for man machine interactions. Also, it can be used to help the people with impaired-hearing. We use HMMs (hidden Markov models) to convert audio signals to a sequence of visemes. In this paper, we compare two approaches in using HMMs. In the first approach, an HMM is trained for each viseme, and the audio signals are directly recognized as a sequence of visemes. In the second approach, each phoneme is modeled with an HMM, and a general phoneme recognizer is utilized to produce a phoneme sequence from the audio signals. The phoneme sequence is then converted to a viseme sequence. We implemented the two approaches and tested them on the TIMIT speech corpus. The viseme recognizer shows 33.9% error rate, and the phoneme-based approach exhibits 29.7% viseme recognition error rate. When similar viseme classes are merged, we have found that the error rates can be reduced to 20.5% and 13.9%, respectably.

IEEE Transactions on Audio, Speech, and Language Processing | 2015

Formant-based robust voice activity detection

In Chul Yoo; Hyeontaek Lim; Dongsuk Yook

Voice activity detection (VAD) can be used to distinguish human speech from other sounds, and various applications can benefit from VAD-including speech coding and speech recognition. To accurately detect voice activity, the algorithm must take into account the characteristic features of human speech and/or background noise. In many real-life applications, noise frequently occurs in an unexpected manner, and in such situations, it is difficult to determine the characteristics of noise with sufficient accuracy. As a result, robust VAD algorithms that depend less on making correct noise estimates are desirable for real-life applications. Formants are the major spectral peaks of the human voice, and these are highly useful to distinguish vowel sounds. The characteristics of the spectral peaks are such that, these peaks are likely to survive in a signal after severe corruption by noise, and so formants are attractive features for voice activity detection under low signal-to-noise ratio (SNR) conditions. However, it is difficult to accurately extract formants from noisy signals when background noise introduces unrelated spectral peaks. Therefore, this paper proposes a simple formant-based VAD algorithm to overcome the problem of detecting formants under conditions with severe noise. The proposed method achieves a much faster processing time and outperforms standard VAD algorithms under various noise conditions. The proposed method is robust against various types of noise and produces a light computational load, so it is suitable for use in various applications.

IEEE Signal Processing Letters | 2007

Linear Spectral Transformation for Robust Speech Recognition Using Maximum Mutual Information

Dong-Hyun Kim; Dongsuk Yook

This paper presents a transformation-based rapid adaptation technique for robust speech recognition using a linear spectral transformation (LST) and a maximum mutual information (MMI) criterion. Previously, a maximum likelihood linear spectral transformation (ML-LST) algorithm was proposed for fast adaptation in unknown environments. Since the MMI estimation method does not require evenly distributed training data and increases the a posteriori probability of the word sequences of the training data, we combine the linear spectral transformation method and the MMI estimation technique in order to achieve extremely rapid adaptation using only one word of adaptation data. The proposed algorithm, called MMI-LST, was implemented using the extended Baum-Welch algorithm and phonetic lattices, and evaluated on the TIMIT and FFMTIMIT corpora. It provides a relative reduction in the speech recognition error rate of 11.1% using only 0.25 s of adaptation data.

IEEE Transactions on Systems, Man, and Cybernetics | 2016

Fast Sound Source Localization Using Two-Level Search Space Clustering

Dongsuk Yook; Taewoo Lee; Youngkyu Cho

Steered response power phase transform (SRP-PHAT) is a method that is widely used for robust sound source localization (SSL). However, since SRP-PHAT searches over a large number of candidate locations, it is too slow to run in real-time for large-scale microphone array systems. In this paper, we propose a robust two-level search space clustering method to speed-up SRP-PHAT-based SSL. The proposed method divides the candidate locations of the sound source into a set of groups and finds a small number of groups that are likely to contain the maximum power location. By searching within the small number of groups, the computational costs are reduced by 61.8% compared to a previously proposed method without loss of accuracy.

intelligent data engineering and automated learning | 2002

Viseme Recognition Experiment Using Context Dependent Hidden Markov Models

Soonkyu Lee; Dongsuk Yook

Visual images synchronized with audio signals can provide user-friendly interface for man machine interactions. The visual speech can be represented as a sequence of visemes, which are the generic face images corresponding to particular sounds. We use HMMs (hidden Markov models) to convert audio signals to a sequence of visemes. In this paper, we compare two approaches in using HMMs. In the first approach, an HMM is trained for each triviseme which is a viseme with its left and right context, and the audio signals are directly recognized as a sequence of trivisemes. In the second approach, each triphone is modeled with an HMM, and a general triphone recognizer is used to produce a triphone sequence from the audio signals. The triviseme or triphone sequence is then converted to a viseme sequence. The performances of the two viseme recognition systems are evaluated on the TIMIT speech corpus.

IEEE Transactions on Consumer Electronics | 2013

An efficient audio fingerprint search algorithm for music retrieval

Sunhyung Lee; Dongsuk Yook; Sukmoon Chang

The conventional audio fingerprinting system by Haitsma uses a lookup table to identify the candidate songs in the database, which contains the sub-fingerprints of songs, and searches the candidates to find a song whose bit error rate is the lowest. However, this approach has a drawback that the number of database accesses increases dramatically, especially when the database contains a large number of songs or when a matching sub-fingerprint is not found in the lookup table due to a heavily degraded input signal. In this paper, a novel search method is proposed to overcome these difficulties. The proposed method partitions each song found from the lookup table into blocks, assigns a weight to each block, and uses the weight as a search priority to speed up the search process while reducing the number of database accesses. Various results from our experiment show the significant improvement in search speed while maintaining the search accuracy comparable to the conventional method.

IEEE Transactions on Consumer Electronics | 2009

Space-time voice activity detection

Hyeopwoo Lee; Dongsuk Yook

When speech-based interfaces are used for small handheld devices such as cellular phones and personal digital assistants in mobile environments with unknown noises and surrounding talkers, all signals except the legitimate users voice must be rejected as noise signals by the system. This paper proposes a new algorithm that detects the users voice in spatial and temporal domains using directional and spectral information. It rejects undesirable signals that originate from noise sources or surrounding talkers. Experimental results indicate the proposed algorithm reduces the voice activity detection error rate by 34.3% relative to the conventional methods.

Explore More