Minkyu Lim | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Minkyu Lim is active.

Explore More

Publication

Featured researches published by Minkyu Lim.

ieee international conference on fuzzy systems | 2009

Music copyright protection system using fuzzy similarity measure for music phoneme segmentation

Kwang-Ho Kim; Minkyu Lim; Ji-Hwan Kim

In this paper, we propose a method for rejection using fuzzy similarity measure and devise a music copyright protection system with the combination of this rejection method and our previous HMM-based music identification system. We implement a music copyright protection system for 1,100 registered music files. Our system demonstrates identification results of 100% for these registered music files and robust identification performance for all signal-level variations of these files. Applying our method of rejection, the system successfully rejects 495 of 500 unregistered music files while its false acceptance rate stands at 0%. Surprisingly, the system achieves additional improvement in music identification when employing this rejection method.

Phonetics and Speech Sciences | 2012

Music Recognition Using Audio Fingerprint: A Survey

Donghyun Lee; Minkyu Lim; Ji-Hwan Kim

Interest in music recognition has been growing dramatically after NHN and Daum released their mobile applications for music recognition in 2010. Methods in music recognition based on audio analysis fall into two categories: music recognition using audio fingerprint and Query-by-Singing/Humming (QBSH). While music recognition using audio fingerprint receives music as its input, QBSH involves taking a user-hummed melody. In this paper, research trends are described for music recognition using audio fingerprint, focusing on two methods: one based on fingerprint generation using energy difference between consecutive bands and the other based on hash key generation between peak points. Details presented in the representative papers of each method are introduced.

Journal of Digital Contents Society | 2017

LSTM RNN-based Korean Speech Recognition System Using CTC

Donghyun Lee; Minkyu Lim; Hosung Park; Ji-Hwan Kim

Long Short Term Memory (LSTM) Recurrent Neural Network (RNN)를 이용한 hybrid 방법은 음성 인식률을 크게 향상시켰다. Hybrid 방법에 기반한 음향모델을 학습하기 위해서는 Gaussian Mixture Model (GMM)-Hidden Markov Model (HMM)로부터 forced align된 HMM state sequence가 필요하다. 그러나, GMM-HMM을 학습하기 위해서 많은 연산 시간이 요구되고 있다. 본 논 문에서는 학습 속도를 향상하기 위해, LSTM RNN 기반 한국어 음성인식을 위한 end-to-end 방법을 제안한다. 이를 구현하기 위 해, Connectionist Temporal Classification (CTC) 알고리즘을 제안한다. 제안하는 방법은 기존의 방법과 비슷한 인식률을 보였지

China Communications | 2017

Long short-term memory recurrent neural network-based acoustic model using connectionist temporal classification on a large-scale training corpus

Donghyun Lee; Minkyu Lim; Hosung Park; Yoseb Kang; Jeong-Sik Park; Gil-Jin Jang; Ji-Hwan Kim

A Long Short-Term Memory (LSTM) Recurrent Neural Network (RNN) has driven tremendous improvements on an acoustic model based on Gaussian Mixture Model (GMM). However, these models based on a hybrid method require a forced aligned Hidden Markov Model (HMM) state sequence obtained from the GMM-based acoustic model. Therefore, it requires a long computation time for training both the GMM-based acoustic model and a deep learning-based acoustic model. In order to solve this problem, an acoustic model using CTC algorithm is proposed. CTC algorithm does not require the GMM-based acoustic model because it does not use the forced aligned HMM state sequence. However, previous works on a LSTM RNN-based acoustic model using CTC used a small-scale training corpus. In this paper, the LSTM RNN-based acoustic model using CTC is trained on a large-scale training corpus and its performance is evaluated. The implemented acoustic model has a performance of 6.18% and 15.01% in terms of Word Error Rate (WER) for clean speech and noisy speech, respectively. This is similar to a performance of the acoustic model based on the hybrid method.

Phonetics and Speech Sciences | 2015

Audio Event Classification Using Deep Neural Networks

Minkyu Lim; Donghyun Lee; Kwang-Ho Kim; Ji-Hwan Kim

This paper proposes an audio event classification method using Deep Neural Networks (DNN). The proposed method applies Feed Forward Neural Network (FFNN) to generate event probabilities of ten audio events (dog barks, engine idling, and so on) for each frame. For each frame, mel scale filter bank features of its consecutive frames are used as the input vector of the FFNN. These event probabilities are accumulated for the events and the classification result is determined as the event with the highest accumulated probability. For the same dataset, the best accuracy of previous studies was reported as about 70% when the Support Vector Machine (SVM) was applied. The best accuracy of the proposed method achieves as 79.23% for the UrbanSound8K dataset when 80 mel scale filter bank features each from 7 consecutive frames (in total 560) were implemented as the input vector for the FFNN with two hidden layers and 2,000 neurons per hidden layer. In this configuration, the rectified linear unit was suggested as its activation function.

Natural Language Dialog Systems and Intelligent Assistants | 2015

A Voice QR Code for Mobile Devices

Donghyun Lee; Minkyu Lim; Minho Ryang; Kwang-Ho Kim; Gil-Jin Jang; Jeong-Sik Park; Ji-Hwan Kim

This paper proposes a voice QR code for mobile devices. The QR code shows great performance for error correction and recovers decoding errors caused by skewed image angle or luminosity. In order to correct an image shot of the QR code symbol, a complex error code and data map need to be generated. Additionally, there is a need for an efficient QR code format and an audio codec for voice interface. This paper presents the generation method of the complex error code and data map in the voice QR code and suggests the efficient QR code format and an audio codec for voice interface.

Natural Language Dialog Systems and Intelligent Assistants | 2015

Performance Analysis of FFNN-Based Language Model in Contrast with n-Gram

Kwang-Ho Kim; Donghyun Lee; Minkyu Lim; Minho Ryang; Gil-Jin Jang; Jeong-Sik Park; Ji-Hwan Kim

In this paper, we analyze the performance of feed forward neural network (FFNN)-based language model in contrast with n-gram. The probability of n-gram language model was estimated based on the statistics of word sequences. The FFNN-based language model was structured by three hidden layers, 500 hidden units per each hidden layer, and 30 dimension word embedding. The performance of FFNN-based language model is better than that of n-gram by 1.5 % in terms of WER on the English WSJ domain.

broadband and wireless computing, communication and applications | 2014

Development of Small Footprint Korean Large Vocabulary Speech Recognition for Commanding a Standalone Robot

Donghyun Lee; Minkyu Lim; Myoung-Wan Koo; Jungyun Seo; Gil-Jin Jang; Ji-Hwan Kim; Jeong-Sik Park

The work in this paper concerns a small footprint Acoustic Model (AM) and its use in the implementation of a Large Vocabulary Isolated Speech Recognition (LVISR) system for commanding a robot in the Korean language, which requires about 500KB of memory. Tree-based state clustering was applied to reduce the number of total unique states, while preserving its original performance. A decision tree induction method was developed for the tree-based state clustering. For this method, a binary question set, measurement function and stopping criterion were devised. A phoneme set consisting of 38 phonemes was defined for the implementation of small footprint Korean LVISR. Further reduction in memory requirement was achieved through integer arithmetic operation. The best multiplication factor was determined for this operation. As a result, we successfully developed a small footprint Korean LVISR that requires memory space about 500KB.

IEEE Transactions on Consumer Electronics | 2009

Domain corpus independent vocabulary generation for embedded continuous speech recognition

Minkyu Lim; Kwang-Ho Kim; Ji-Hwan Kim

This paper proposes a domain corpus independent vocabulary generation algorithm in order to improve the coverage of vocabulary for embedded continuous speech recognition (CSR). A vocabulary in CSR is normally derived from a word frequency list. Therefore, the vocabulary coverage is dependent on a domain corpus. We present an improved way of vocabulary generation using part-of-speech (POS) tagged corpus and knowledge base. We investigate 152 POS tags defined in a POS tagged corpus and word-POS tag pairs. We analyze all words paired with 101 among 152 POS tags and decide on a set of words which have to be included in vocabularies of any size. The other 51 POS tags are mainly categorized with noun-related, named entity (NE)-related and verb-related POSs. We introduce a domain corpus independent word inclusion method for noun-, verb-, and NE-related POS tags using knowledge base. For noun-related POS tags, we generate synonym groups and analyze their relative importance using Google search. Then, we categorize verbs by lemma and analyze relative importance of each lemma from a pre-analyzed statistic for verbs. We determine the inclusion order of NEs through Google search. The proposed method shows at least 28.6% relative improvement of coverage for a SMS text corpus when the sizes of vocabulary are 5 K, 10 K, 15 K and 20 K. In particular, the coverage of 15 K size vocabulary generated by the proposed method reaches up to 97.8% with the relative improvement of 44.2%.

arXiv: Computation and Language | 2018