Hyoung-Gook Kim
Technical University of Berlin
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hyoung-Gook Kim.
IEEE Transactions on Circuits and Systems for Video Technology | 2004
Hyoung-Gook Kim; Nicolas Moreau; Thomas Sikora
In this paper, we present an MPEG-7-based audio classification and retrieval technique targeted for analysis of film material. The technique consists of low-level descriptors and high-level description schemes. For low-level descriptors, low-dimensional features such as audio spectrum projection based on audio spectrum basis descriptors is produced in order to find a balanced tradeoff between reducing dimensionality and retaining maximum information content. High-level description schemes are used to describe the modeling of reduced-dimension features, the procedure of audio classification, and retrieval. A classifier based on continuous hidden Markov models is applied. The sound model state path, which is selected according to the maximum-likelihood model, is stored in an MPEG-7 sound database and used as an index for query applications. Various experiments are presented where the speaker- and sound-recognition rates are compared for different feature extraction methods. Using independent component analysis, we achieved better results than normalized audio spectrum envelope and principal component analysis in a speaker recognition system. In audio classification experiments, audio sounds are classified into selected sound classes in real time with an accuracy of 96%.
IEEE Transactions on Consumer Electronics | 2006
Xuan Zhu; Yuan-Yuan Shi; Hyoung-Gook Kim; Kiwan Eom
In this paper, an integrated music recommendation system is proposed, which contains the functions of automatic music genre classification, automatic music emotion classification, and music similarity query. A novel tempo feature, named as log-scale modulation frequency coefficients, is presented in this paper. With AdaBoost algorithm, the proposed tempo feature is combined with timbre features and improves the performance of music genre and emotion classification. Comparing with the conventional methods based on timbre features, the precision of five-genre classification is enhanced from 86.8% to 92.2% and the accuracy of four-emotion classification is increased from 86.0% to 90.5%. Based on the results of music genre/emotion classification, we design a similarity query scheme, which can speed up the similarity query process without decreasing the precision. Furthermore, all the features employed in this paper are extracted from the data of MP3 partially decoding, which significantly reduces the feature extraction time
international conference on acoustics, speech, and signal processing | 2004
Hyoung-Gook Kim; Thomas Sikora
We evaluate the MPEG-7 audio spectrum projection (ASP) features for general sound recognition performance against the well established MFCC. The recognition tasks of interest are speaker recognition, sound classification, and segmentation of audio using sound/speaker identification. For sound classification we use three approaches: direct approach; hierarchical approach without hints; hierarchical approach with hints. For audio segmentation, the MPEG-7 ASP features and MFCCs are used to train hidden Markov models (HMM) for individual speakers and sounds. The trained sound/speaker models are then used to segment conversational speech involving a given subset of people in panel discussion television programs. Results show that the MFCC approach yields a sound/speaker recognition rate superior to MPEG-7 implementations.
international conference on acoustics, speech, and signal processing | 2005
Hyoung-Gook Kim; Daniel Ertelt; Thomas Sikora
We present a hybrid speaker-based segmentation, which combines metric-based and model-based techniques. Without a priori information about the number of speakers and speaker identities, the speech stream is segmented in three stages: (1) the most likely speaker changes are detected; (2) to group segments of identical speakers, a two-level clustering algorithm is performed using a Bayesian information criterion (BIC) and HMM model scores - every cluster is assumed to contain only one speaker; (3) the speaker models are reestimated from each cluster by HMM. Finally a resegmentation step performs a more refined segmentation using these speaker models. To measure the performance, we compare the segmentation results of the proposed hybrid method versus metric-based segmentation. Results show that the hybrid approach using two-level clustering significantly outperforms direct metric-based segmentation.
international conference on multimedia and expo | 2006
Yuan-Yuan Shi; Xuan Zhu; Hyoung-Gook Kim; Kiwan Eom
This paper proposes a tempo feature extraction method based on the long-term modulation spectrum analysis. To transform the modulation spectrum to a condensed feature vector, the log-scale modulation frequency coefficients are introduced. This idea aims at averaging the modulation frequency energy via the constant-Q filter-banks. Further it is pointed out that the feature can be extracted directly from the perceptually compressed data of digital music archives. To verify the effectiveness of the feature and its utility to music applications, the feature vector is used in a music emotion classification system. The system consisting two layers of Adaboost classifiers. In the first layer the conventional timbre features are employed. Then by adding the tempo feature in the second layer, the classification precision is improved dramatically. By this way the discriminability of the classifier based on the given features can be exploited extremely. The system obtains high classification precision on a small corpus. It proves that the proposed feature is very effective and computationally efficient to characterize the tempo information of music
international conference on multimedia and expo | 2006
Martin Haller; Hyoung-Gook Kim; Thomas Sikora
This paper presents a content-based audiovisual video analysis technique for anchorperson detection in broadcast news. For topic-oriented navigation in newscasts, a segmentation of the topic boundaries is needed. As the anchorperson gives a strong indication for such boundaries, the presented technique automatically determines that high-level information for video indexing from MPEG-2 videos and stores the results in an MPEG-7 conform format. The multimodal analysis process is carried out separately in the auditory and visual modality, and the decision fusion forms the final anchorperson segments
asilomar conference on signals, systems and computers | 2003
M. Schwab; Hyoung-Gook Kim; Wiryadi; P. Noll
In this paper we present robust noise estimation for speech enhancement algorithms. The robust noise estimation based on a modified minima controlled recursive averaging noise estimator was applied to different speech estimators. The investigated speech estimators were spectral subtraction (SS), log spectral amplitude speech estimator (LSA) and optimally modified log spectral amplitude estimator (OM-LSA). The performances of the different algorithms were measured both by the signal-to-noise ratio (SNR) and recognition accuracy of automatic speech recognition (ASR).
electronic imaging | 2003
Hyoung-Gook Kim; Thomas Sikora
In this paper, dimension-reduced, decorrelated spectral features for general sound recognition are applied to segment conversational speech of both broadcast news audio and panel discussion television programs. Without a priori information about number of speakers, the audio stream is segmented by a hybrid metric-based and model-based segmentation algorithm. For the measure of the performance we compare the segmentation results of the hybrid method versus metric-based segmentation with both the MPEG-7 standardized features and Mel-scale Frequency Cepstrum Coefficients (MFCC). Results show that the MFCC features yield better performance compared to MPEG-7 features. The hybrid approach significantly outperforms direct metric based segmentation.
electronic imaging | 2003
Hyoung-Gook Kim; Thomas Sikora
In this paper, we present a classification and retrieval technique targeted for retrieval of home video abstract using dimension-reduced, decorrelated spectral features of audio content. The feature extraction based on MPEG-7 descriptors consists of three main stages: Normalized Audio Spectrum Envelope (NASE), basis decomposition algorithm and basis projection, obtained by multiplying the NASE with a set of extracted basis functions. A classifier based on continuous hidden Markov models is applied. For retrieval with accurate performance the system consists of a two-level hierarchy method using speech recognition and sound classification. For the measure of the performance we compare the classification results of MPEG-7 standardized features vs. Mel-scale Frequency Cepstrum Coefficients (MFCC). Results show that the MFCC features yield better performance compared to MPEG-7 features.
Archive | 2005
Hyoung-Gook Kim; Nicolas Moreau; Thomas Sikora