Myoung-Wan Koo | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Myoung-Wan Koo is active.

Explore More

Publication

Featured researches published by Myoung-Wan Koo.

International Journal of Imaging Systems and Technology | 2013

Two‐pass search strategy using accumulated band energy histogram for HMM‐based identification of perceptually identical music

Jinbok Myung; Kwang-Ho Kim; Jeong-Sik Park; Myoung-Wan Koo; Ji-Hwan Kim

In this article, we present an efficient two‐pass search strategy for the implementation of a Hidden Markov Model (HMM)‐based music identification system. In our previous work, we demonstrated a single‐pass HMM‐based music identification system, considering its application to music copyright protection. This conventional system showed very robust performance to signal‐level variations between perceptually identical music files. However, it requires heavy computation for search. In the proposed two‐pass search system, the conventional single‐pass search is extended to two‐pass. In pass 1 of the proposed method, a queried music produces an accumulated band energy histogram which is a set of normalized sums of band energies for each frequency bin. This histogram is compared to all of the histograms for the registered music files. The system generates a list of small number of most probable music files among the all of the registered music files. In pass 2, HMM‐based search is applied only to candidate music files selected in pass 1. Using the proposed two‐pass strategy, we successfully implemented a HMM‐based music identification system, which maintains the same level robustness to signal level variations between perceptually identical music files but also produces the identification result very quickly.

international conference on big data and smart computing | 2017

Convolutional Neural Network using a threshold predictor for multi-label speech act classification

Guanghao Xu; Hyun Jung Lee; Myoung-Wan Koo; Jungyun Seo

Regarding the spoken language understanding (SLU) pilot task of the Dialog State Tracking Challenge 5 (DSTC5), it is required to classify label sets of speech acts on human-to-human dialogues. In this paper, we propose a multi-label classification model with the assistance of algorithm adaptation method. To be specific, a Convolutional Neural Network (CNN) model on top of pre-trained word vectors is adapted for the multi-label classification task by utilizing a threshold learning mechanism. In order to evaluate the performance of our proposed model, comparative experiments on the DSTC5 dialogue datasets are conducted. Experimental results show that the proposed model outperforms most of the submitted model in the DSTC5 in terms of F1-score. Without any manually designed features, our model has advantage of handling the multi-label SLU task, using only publicly available pre-trained word vectors.

Advances in Science, Technology and Engineering Systems Journal | 2017

Retrieving Dialogue History in Deep Neural Networks for Spoken Language Understanding

Myoung-Wan Koo; Guanghao Xu; Hyun Jung Lee; Jungyun Seo

A R T I C L E I N F O A B S T R A C T Article history: Received: 30 May, 2017 Accepted: 13 August, 2017 Online: 15 September, 2017 In this paper, we propose a revised version of the semantic decoder for multi-label classification task in the spoken language understanding (SLU) pilot task of the Dialog State Tracking Challenge 5 (DSTC5). Our model concatenates two deep neural networks a Convolutional Neural Network (CNN) and a Recurrent Neural Networks (RNN) for detecting semantic meaning of incoming utterance with the assistance of algorithm adaptation method. In order to evaluate the robustness of our proposed models, comparative experiments on the DSTC5 dialogue datasets are conducted. Experimental results show that the proposed models outperform most of the submitted models in the DSTC5 in terms of F1-score. Without any manually designed features or delexicalization, our model has proven its efficiency of tackling the multi-label SLU task, using only publicly available pre-trained word vectors. Our model is capable of retrieving the dialogue history, and thereby it could build the concise concept structure by employing the pragmatic intention as well as semantic meaning of utterances. The architecture of our semantic decoder has a potential to be applicable to other variety of human-to-human dialogues to achieve SLU.

broadband and wireless computing, communication and applications | 2014

Development of Small Footprint Korean Large Vocabulary Speech Recognition for Commanding a Standalone Robot

Donghyun Lee; Minkyu Lim; Myoung-Wan Koo; Jungyun Seo; Gil-Jin Jang; Ji-Hwan Kim; Jeong-Sik Park

The work in this paper concerns a small footprint Acoustic Model (AM) and its use in the implementation of a Large Vocabulary Isolated Speech Recognition (LVISR) system for commanding a robot in the Korean language, which requires about 500KB of memory. Tree-based state clustering was applied to reduce the number of total unique states, while preserving its original performance. A decision tree induction method was developed for the tree-based state clustering. For this method, a binary question set, measurement function and stopping criterion were devised. A phoneme set consisting of 38 phonemes was defined for the implementation of small footprint Korean LVISR. Further reduction in memory requirement was achieved through integer arithmetic operation. The best multiplication factor was determined for this operation. As a result, we successfully developed a small footprint Korean LVISR that requires memory space about 500KB.

asia-pacific signal and information processing association annual summit and conference | 2013

An experimental study on structural-MAP approaches to implementing very large vocabulary speech recognition systems for real-world tasks

I-Fan Chen; Sabato Marco Siniscalchi; Seokyong Moon; Daejin Shin; Myoung-Wan Koo; Minhwa Chung; Chin-Hui Lee

In this paper we present an experimental study exploiting structural Bayesian adaptation for handling potential mismatches between training and test conditions for real-world applications to be realized in our multilingual very large vocabulary speech recognition (VLVSR) system project sponsored by MOTIE (The Ministry of Trade, Industry and Energy), Republic of Korea. The goal of the project is to construct a national-wide VLVSR cloud service platform for mobile applications. Besides system architecture design issues, at such a large scale, performance robustness problems, caused by mismatches in speakers, tasks, environments, and domains, etc., need to be taken into account very carefully as well. We decide to adopt adaptation, especially the structural MAP, techniques to reduce system accuracy degradation caused by these mismatches. Being part of an ongoing project, we describe how structural MAP approaches can be used for adaptation of both acoustic and language models for our VLVSR systems, and provide convincing experimental results to demonstrate how adaptation can be utilized to bridge the performance gap between the current state-of-the-art and deployable VLVSR systems.

Archive | 2013

Implementation of a Large-Scale Language Model in a Cloud Environment for Human–Robot Interaction

Dae-Young Jung; Hyuk-Jun Lee; Sungyong Park; Myoung-Wan Koo; Ji-Hwan Kim; Jeong-Sik Park; Hyung-Bae Jeon; Yunkeun Lee

This paper presents a large-scale language model for daily-generated large-size text corpora using Hadoop in a cloud environment for improving the performance of a human–robot interaction system. Our large-scale trigram language model, consisting of 800 million trigram counts, was successfully implemented through a new approach using a representative cloud service (Amazon EC2), and a representative distributed processing framework (Hadoop). We performed trigram count extraction using Hadoop MapReduce to adapt our large-scale language model. Three hours are estimated on six servers to extract trigram counts for a large text corpus of 200 million word Twitter texts, which is the approximate number of daily-generated Twitter texts.

Archive | 2013

Performance Analysis of Noise Robust Audio Hashing in Music Identification for Entertainment Robot

Namhyun Cho; Donghoon Shin; Donghyun Lee; Kwang-Ho Kim; Jeong-Sik Park; Myoung-Wan Koo; Ji-Hwan Kim

Many technical papers have been published related to music identification. However, most of these papers have focused on describing their algorithms and their overall performance. When music identification is applied to embedded devices, the performance is affected by the level of frame boundary desynchronization, environmental noise, and channel noise. This paper presents an empirical performance analysis of music identification, in terms of its Peak Point Hit Ratio (PPHR). In theory, music identification systems guarantee a 100 % accurate PPHR between the queried music and its reference. However, PPHR falls to 40.8 % by desynchronization when a frame boundary is desynchronized by half the frame shift. In addition, due to environmental noise, PPHR decreases to 69.6, 59.4, 46.1, and 24.3 % at SNR 15 dB, 10 dB, 5 dB, and 0 dB, respectively. For music clips recorded in an office environment, PPHR is 58.7 % due to environmental and channel noise.

Computer Speech & Language | 2019