Kyogu Lee | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kyogu Lee is active.

Explore More

Publication

Featured researches published by Kyogu Lee.

IEEE Transactions on Audio, Speech, and Language Processing | 2008

Acoustic Chord Transcription and Key Extraction From Audio Using Key-Dependent HMMs Trained on Synthesized Audio

Kyogu Lee; Malcolm Slaney

We describe an acoustic chord transcription system that uses symbolic data to train hidden Markov models and gives best-of-class frame-level recognition results. We avoid the extremely laborious task of human annotation of chord names and boundaries-which must be done to provide machine learning models with ground truth-by performing automatic harmony analysis on symbolic music files. In parallel, we synthesize audio from the same symbolic files and extract acoustic feature vectors which are in perfect alignment with the labels. We, therefore, generate a large set of labeled training data with a minimal amount of human labor. This allows for richer models. Thus, we build 24 key-dependent HMMs, one for each key, using the key information derived from symbolic data. Each key model defines a unique state-transition characteristic and helps avoid confusions seen in the observation vector. Given acoustic input, we identify a musical key by choosing a key model with the maximum likelihood, and we obtain the chord sequence from the optimal state path of the corresponding key model, both of which are returned by a Viterbi decoder. This not only increases the chord recognition accuracy, but also gives key information. Experimental results show the models trained on synthesized data perform very well on real recordings, even though the labels automatically generated from symbolic data are not 100% accurate. We also demonstrate the robustness of the tonal centroid feature, which outperforms the conventional chroma feature.

Expert Systems With Applications | 2014

Music recommendation using text analysis on song requests to radio stations

Ziwon Hyung; Kibeom Lee; Kyogu Lee

Recommending appropriate music to users has always been a difficult task. In this paper, we propose a novel method in recommending music by analyzing the textual input of users. To this end, we mine a large corpus of documents from a Korean radio stations online bulletin board. Each document, written by the listener, is composed of a song request associated with a brief, personal story. We assume that such stories are closely related with the background of the song requests and thus, our system performs text analysis to recommend songs that were requested from other similar stories. We evaluate our system using conventional metrics along with a user evaluation test. Results show that there is close correlation between document similarity and song similarity, indicating the potential of using text as a source to recommending music.

Proceedings of the 1st ACM workshop on Audio and music computing multimedia | 2006

Automatic chord recognition from audio using a supervised HMM trained with audio-from-symbolic data

Kyogu Lee; Malcolm Slaney

A novel approach for obtaining labeled training data is presented to directly estimate the model parameters in a supervised learning algorithm for automatic chord recognition from the raw audio. To this end, harmonic analysis is first performed on symbolic data to generate label files. In paral-lel, we synthesize audio data from the same symbolic data, which are then provided to a machine learning algorithm along with label files to estimate model parameters. Experimental results show higher performance in frame-level chord recognition than the previous approaches.

IEEE Transactions on Audio, Speech, and Language Processing | 2017

Deep Convolutional Neural Networks for Predominant Instrument Recognition in Polyphonic Music

Yoonchang Han; Jaehun Kim; Kyogu Lee

Identifying musical instruments in polyphonic music recordings is a challenging but important problem in the field of music information retrieval. It enables music search by instrument, helps recognize musical genres, or can make music transcription easier and more accurate. In this paper, we present a convolutional neural network framework for predominant instrument recognition in real-world polyphonic music. We train our network from fixed-length music excerpts with a single-labeled predominant instrument and estimate an arbitrary number of predominant instruments from an audio signal with a variable length. To obtain the audio-excerpt-wise result, we aggregate multiple outputs from sliding windows over the test audio. In doing so, we investigated two different aggregation methods: one takes the class-wise average followed by normalization, and the other perform temporally local class-wise max-pooling on the output probability prior to averaging and normalization steps to minimize the effect of averaging process suppresses the activation of sporadically appearing instruments. In addition, we conducted extensive experiments on several important factors that affect the performance, including analysis window size, identification threshold, and activation functions for neural networks to find the optimal set of parameters. Our analysis on the instrument-wise performance found that the onset type is a critical factor for recall and precision of each instrument. Using a dataset of 10k audio excerpts from 11 instruments for evaluation, we found that convolutional neural networks are more robust than conventional methods that exploit spectral features and source separation with support vector machines. Experimental results showed that the proposed convolutional network architecture obtained an F1 measure of 0.619 for micro and 0.513 for macro, respectively, achieving 23.1% and 18.8% in performance improvement compared with the state-of-the-art algorithm.

IEEE Transactions on Multimedia | 2014

Using Dynamically Promoted Experts for Music Recommendation

Kibeom Lee; Kyogu Lee

Recommender systems have become an invaluable asset to online services with the ever-growing number of items and users. Most systems focused on recommendation accuracy, predicting likable items for each user. Such methods tend to generate popular and safe recommendations, but fail to introduce users to potentially risky, yet novel items that could help in increasing the variety of items consumed by the users. This is known as popularity bias, which is predominant in methods that adopt collaborative filtering. Recently, however, recommenders have started to improve their methods to generate lists that encompass diverse items that are both accurate and novel through specific novelty-driven algorithms or hybrid recommender systems. In this paper, we propose a recommender system that uses the concepts of Experts to find both novel and relevant recommendations. By analyzing the ratings of the users, the algorithm promotes special Experts from the user population to create novel recommendations for a target user. Thus, different users are promoted dynamically to Experts depending on who the recommendations are for. The system used data collected from Last.fm and was evaluated with several metrics. Results show that the proposed system outperforms matrix factorization methods in finding novel items and performs on par in finding simultaneously novel and relevant items. This system can also provide a means to popularity bias while preserving the advantages of collaborative filtering.

IEEE Signal Processing Letters | 2014

Vocal Separation from Monaural Music Using Temporal/Spectral Continuity and Sparsity Constraints

Il-Young Jeong; Kyogu Lee

In this letter, we describe a novel approach for separating a vocal signal from monaural music. We assume that the accompaniment in a music signal can be represented as the sum of the sustained harmonic and percussive sounds. Based on the observation that singing voices usually contain rapidly changing harmonic signals such as fast vibratos, slides, and/or glissandos, we propose a statistical model for the separation of harmonic/percussive and vocal sounds. To this end, we define an objective function that exploits the temporal/spectral continuities of harmonic/percussive sounds and the sparsity of vocal sounds in the spectrogram domain. Experimental results show that the proposed algorithm successfully separates the vocal from the accompaniment, resulting in a performance significantly better than that of conventional algorithms or comparable to the state-of-the-art algorithms.

workshop on applications of signal processing to audio and acoustics | 2013

Acoustic scene classification using sparse feature learning and event-based pooling

Kyogu Lee; Ziwon Hyung; Juhan Nam

Recently unsupervised learning algorithms have been successfully used to represent data in many of machine recognition tasks. In particular, sparse feature learning algorithms have shown that they can not only discover meaningful structures from raw data but also outperform many hand-engineered features. In this paper, we apply the sparse feature learning approach to acoustic scene classification. We use a sparse restricted Boltzmann machine to capture manyfold local acoustic structures from audio data and represent the data in a high-dimensional sparse feature space given the learned structures. For scene classification, we summarize the local features by pooling over audio scene data. While the feature pooling is typically performed over uniformly divided segments, we suggest a new pooling method, which first detects audio events and then performs pooling only over detected events, considering the irregular occurrence of audio events in acoustic scene data. We evaluate the learned features on the IEEE AASP Challenge development set, comparing them with a baseline model using mel-frequency cepstral coefficients (MFCCs). The results show that learned features outperform MFCCs, event-based pooling achieves higher accuracy than uniform pooling and, furthermore, a combination of the two methods performs even better than either one used alone.

Journal of Neurosurgery | 2016

Transmastoid reshaping of the sigmoid sinus: preliminary study of a novel surgical method to quiet pulsatile tinnitus of an unrecognized vascular origin

Chong Sun Kim; So Young Kim; Hyunseok Choi; Ja-Won Koo; Shin-Young Yoo; Gwang Seok An; Kyogu Lee; Inyong Choi; Jae-Jin Song

OBJECTIVE A dominant sigmoid sinus with focal dehiscence or thinning (DSSD/T) of the overlying bony wall is a commonly encountered, but frequently overlooked, cause of vascular pulsatile tinnitus (VPT). Also, the pathophysiological mechanism of sound perception in patients with VPT remains poorly understood. In the present study, a novel surgical method, termed transmastoid SS-reshaping surgery, was introduced to ameliorate VPT in patients with DSSD/T. The authors reviewed a case series, analyzed the surgical outcomes, and suggested the pathophysiological mechanism of sound perception. The theoretical background underlying VPT improvement after transmastoid SS-reshaping surgery was also explored. METHODS Eight patients with VPT that was considered attributable to DSSD/T underwent transmastoid SS-reshaping surgery between February 2010 and February 2015. The mean postoperative follow-up period was 9.5 months (range 4-13 months). Transmastoid SS-reshaping surgery featured simple mastoidectomy, partial compression of the SS using harvested cortical bone chips, and reinforcement of the bony SS wall with bone cement. Perioperative medical records, imaging results, and audiological findings were comprehensively reviewed. RESULTS In 7 of the 8 patients (87.5%), the VPT abated immediately after surgery. Statistically significant improvements in tinnitus loudness and distress were evident on numeric rating scales. Three patients with preoperative ipsilesional low-frequency hearing loss exhibited postoperative improvements in their low-frequency hearing thresholds. No major postoperative complications were encountered except in the first subject, who experienced increased intracranial pressure postoperatively. This subsided after a revision operation for partial decompression of the SS. CONCLUSIONS Transmastoid SS-reshaping surgery may be a good surgical option in patients with DSSD/T, a previously unrecognized cause of VPT. Redistribution of severely asymmetrical blood flow, reinforcement of the bony SS wall with bone cement to reconstruct a soundproof barrier, and disconnection of a problematic sound conduction route via simple mastoidectomy silence VPT.

Otology & Neurotology | 2016

Objectification and Differential Diagnosis of Vascular Pulsatile Tinnitus by Transcanal Sound Recording and Spectrotemporal Analysis: A Preliminary Study.

Jae-Jin Song; Gwang Seok An; Inyong Choi; Dirk De Ridder; So Young Kim; Hyun Seok Choi; Joo Hyun Park; Byung Yoon Choi; Ja-Won Koo; Kyogu Lee

Objective: Although frequently classified as “objective tinnitus,” in most cases vascular pulsatile tinnitus (VPT) is not equal to objective tinnitus because it is not easy to objectively document VPT. The present study was conducted to develop a novel transcanal sound recording and spectrotemporal analysis method for the objective and differential diagnosis of VPT. Study Design: A case series with a control group. Setting: Tertiary referral center. Patients: Six VPT subjects with radiological abnormalities and six normal controls. Interventions and Main Outcome Measure: The method was tested based on recordings obtained from the ipsilateral external auditory canal (EAC) using an insert microphone with the subjects head in four different positions. The recorded signals were first analyzed in the time domain, and short-time Fourier transform was performed to analyze the data in the time–frequency domain. Results: From the temporal analysis, the ear canal signals recorded from the VPT subjects exhibited large peak amplitudes and periodic structures, whereas the signals recorded from the control subjects had smaller peak amplitudes and weaker periodicity. From the STA represented by two-dimensional spectrograms and three-dimensional waterfall diagrams, all of the VPT subjects demonstrated pulse-synchronous acoustic characteristics that were representative of their respective presumptive vascular pathologies, whereas the control subjects did not display such characteristics. Conclusion: The present diagnostic approach may provide additional information regarding the origins of VPT cases as well as an efficient and objective diagnostic method. Furthermore, this approach may aid in the determination of appropriate imaging modalities, treatment planning, and evaluation of treatment outcomes.

Journal of the Acoustical Society of America | 2016

Sparse feature learning for instrument identification: Effects of sampling and pooling methods.

Yoonchang Han; Subin Lee; Juhan Nam; Kyogu Lee

Feature learning for music applications has recently received considerable attention from many researchers. This paper reports on the sparse feature learning algorithm for musical instrument identification, and in particular, focuses on the effects of the frame sampling techniques for dictionary learning and the pooling methods for feature aggregation. To this end, two frame sampling techniques are examined that are fixed and proportional random sampling. Furthermore, the effect of using onset frame was analyzed for both of proposed sampling methods. Regarding summarization of the feature activation, a standard deviation pooling method is used and compared with the commonly used max- and average-pooling techniques. Using more than 47 000 recordings of 24 instruments from various performers, playing styles, and dynamics, a number of tuning parameters are experimented including the analysis frame size, the dictionary size, and the type of frequency scaling as well as the different sampling and pooling methods. The results show that the combination of proportional sampling and standard deviation pooling achieve the best overall performance of 95.62% while the optimal parameter set varies among the instrument classes.

Explore More