Seongkyu Mun
Korea University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Seongkyu Mun.
international conference on acoustics, speech, and signal processing | 2017
Seongkyu Mun; Suwon Shon; Woo-Il Kim; David K. Han; Hanseok Ko
Deep Neural Network (DNN) based transfer learning has been shown to be effective in Visual Object Classification (VOC) for complementing the deficit of target domain training samples by adapting classifiers that have been pre-trained for other large-scaled DataBase (DB). Although there exists an abundance of acoustic data, it can also be said that datasets of specific acoustic scenes are sparse for training Acoustic Scene Classification (ASC) models. By exploiting VOC DNNs ability of learning beyond its pre-trained environments, this paper proposes DNN based transfer learning for ASC. Effectiveness of the proposed method is demonstrated on the database of IEEE DCASE Challenge 2016 Task 1 and home surveillance environment via representative experiments. Its improved performance is verified by comparing it to prominent conventional methods.
conference of the international speech communication association | 2016
Seongkyu Mun; Suwon Shon; Woo-Il Kim; Hanseok Ko
Bottleneck features have been shown to be effective in improving the accuracy of speaker recognition, language identification and automatic speech recognition. However, few works have focused on bottleneck features for acoustic event recognition. This paper proposes a novel acoustic event recognition framework using bottleneck features derived from a Deep Neural Network (DNN). In addition to conventional features (MFCC, Mel-spectrum, etc.), this paper employs rhythm, timbre, and spectrum-statistics features for effectively extracting acoustic characteristics from audio signals. The effectiveness of the proposed method is demonstrated on a database of real life recordings via experiments, and its robust performance is verified by comparing to conventional methods.
advanced video and signal based surveillance | 2015
Suwon Shon; Seongkyu Mun; David K. Han; Hanseok Ko
This paper analyzes heteroscedasticity in i-vector for robust forensics and surveillance speaker recognition system. Linear Discriminant Analysis (LDA), a widely-used linear dimension reduction technique, assumes that classes are homoscedastic within a same covariance. In this paper it is assumed that general speech utterances contain both homoscedastic and heteroscedastic elements. We show the validity of this assumption by employing several analyses and also demonstrate that dimension reduction using principal components is feasible. To effectively handle the presence of heteroscedastic and homoscedastic elements, we propose a fusion approach of applying both LDA and Heteroscedastic-LDA (HLDA). The experiments are conducted to show its effectiveness and compare to other methods using the telephone database of National Institute of Standards and Technology (NIST) Speaker Recognition Evaluation (SRE) 2010 extended.
international conference on consumer electronics | 2015
Seongkyu Mun; Suwon Shon; Woo-Il Kim; Hanseok Ko
This paper proposes a robust speaker direction estimation method based on a microphone array for voice based interaction with smart TV. The proposed method uses speech basis and associated weights of non-negative matrix factorization for finding the speaker independent utterance direction from input signal with noise. The experimental results of the speaker direction estimation in real acoustic environment validate the effectiveness of the proposed algorithm in terms of representative performance measures compared to the conventional methods.
Electronics Letters | 2015
Suwon Shon; Seongkyu Mun; David K. Han; Hanseok Ko
A novel non-negative matrix factorisation (NMF)-based subband decomposition in frequency–spatial domain for acoustic source localisation using a microphone array is introduced. The proposed method decomposes source and noise subband and emphasises source dominant frequency bins for more accurate source representation. By employing NMF, delay basis vectors and their subband information in frequency–spatial domain for each frame is extracted. The proposed algorithm is evaluated in both simulated noise and real noise with a speech corpus database. Experimental results clearly indicate that the algorithm performs more accurately than other conventional algorithms under both reverberant and noisy acoustic environments.
arXiv: Sound | 2018
Sangwook Park; Seongkyu Mun; Younglo Lee; David K. Han; Hanseok Ko
conference of the international speech communication association | 2017
Suwon Shon; Seongkyu Mun; Hanseok Ko
conference of the international speech communication association | 2017
Suwon Shon; Seongkyu Mun; Woo-Il Kim; Hanseok Ko
IEICE Transactions on Information and Systems | 2017
Seongkyu Mun; Suwon Shon; Woo-Il Kim; David K. Han; Hanseok Ko
IEICE Transactions on Information and Systems | 2017
Seongkyu Mun; Minkyu Shin; Suwon Shon; Woo-Il Kim; David K. Han; Hanseok Ko