Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Sungrack Yun is active.

Publication


Featured researches published by Sungrack Yun.


IEEE Transactions on Audio, Speech, and Language Processing | 2012

Loss-Scaled Large-Margin Gaussian Mixture Models for Speech Emotion Classification

Sungrack Yun; Chang D. Yoo

This paper considers a learning framework for speech emotion classification using a discriminant function based on Gaussian mixture models (GMMs). The GMM parameter set is estimated by margin scaling with a loss function to reduce the risk of predicting emotions with high loss. Here, the loss function is defined as a function of a distance metric using the Watson and Tellegens emotion model. Margin scaling is known to have good generalization ability and can be considered appropriate for emotion modeling where the parameter set is likely to be over-fitted to the training data set whose characteristics may differ from those of the testing data set. Our learning framework is formulated as a constrained optimization problem which is solved using semi-definite programming. Three tasks were evaluated: acted emotion classification, natural emotion classification, and cross database emotion classification. In each task, four loss functions were evaluated. In all experiments, results consistently show that margin scaling improves the classification accuracy over other learning frameworks based on the maximum-likelihood, maximum mutual information and max-margin framework without margin scaling. Experiment results also show that margin scaling substantially reduces the overall loss compared to the max-margin framework without margin scaling.


Pervasive and Mobile Computing | 2010

Wearable sensor activity analysis using semi-Markov models with a grammar

Owen Thomas; Peter Sunehag; Gideon Dror; Sungrack Yun; Sungwoong Kim; Matthew W. Robards; Alexander J. Smola; Daniel J. Green; Philo U. Saunders

Detailed monitoring of training sessions of elite athletes is an important component of their training. In this paper we describe an application that performs a precise segmentation and labeling of swimming sessions. This allows a comprehensive breakdown of the training session, including lap times, detailed statistics of strokes, and turns. To this end we use semi-Markov models (SMM), a formalism for labeling and segmenting sequential data, trained in a max-margin setting. To reduce the computational complexity of the task and at the same time enforce sensible output, we introduce a grammar into the SMM framework. Using the trained model on test swimming sessions of different swimmers provides highly accurate segmentation as well as perfect labeling of individual segments. The results are significantly better than those achieved by discriminative hidden Markov models.


international conference on acoustics, speech, and signal processing | 2009

Speech emotion recognition via a max-margin framework incorporating a loss function based on the Watson and Tellegen's emotion model

Sungrack Yun; Chang D. Yoo

This paper considers a method for speech emotion recognition by a max-margin framework incorporating a loss function based on a well-known model called theWatson and Tellegens emotion model. Each emotion is modeled by a single-state hidden Markov model (HMM) that is trained by maximizing the minimum separation margin between emotions, and the margin is scaled by a loss function. The framework is optimized by the semi-definite programming. Experiments were performed to evaluate the framework using the Berlin database of emotional speech. The framework performed better than other conventional training criteria for HMM such as maximum likelihood estimation and maximum mutual information estimation.


IEEE Transactions on Audio, Speech, and Language Processing | 2011

Large Margin Discriminative Semi-Markov Model for Phonetic Recognition

Sungwoong Kim; Sungrack Yun; Chang D. Yoo

This paper considers a large margin discriminative semi-Markov model (LMSMM) for phonetic recognition. The hidden Markov model (HMM) framework that is often used for phonetic recognition assumes only local statistical dependencies between adjacent observations, and it is used to predict a label for each observation without explicit phone segmentation. On the other hand, the semi-Markov model (SMM) framework allows simultaneous segmentation and labeling of sequential data based on a segment-based Markovian structure that assumes statistical dependencies among all the observations within a phone segment. For phonetic recognition which is inherently a joint segmentation and labeling problem, the SMM framework has the potential to perform better than the HMM framework at the expense of slight increase in computational complexity. The SMM framework considered in this paper is based on a non-probabilistic discriminant function that is linear in the joint feature map which attempts to capture long-range statistical dependencies among observations. The parameters of the discriminant function are estimated by a large margin learning framework for structured prediction. The parameter estimation problem in hand leads to an optimization problem with many margin constraints, and this constrained optimization problem is solved using a stochastic gradient descent algorithm. The proposed LMSMM outperformed the large margin discriminative HMM in the TIMIT phonetic recognition task.


international conference on acoustics, speech, and signal processing | 2010

Parametric emotional singing voice synthesis

Younsung Park; Sungrack Yun; Chang D. Yoo

This paper describes an algorithm to control the expressed emotion of a synthesized song. Based on the database of various melodies sung neutrally with restricted set of words, hidden semi-Markov models (HSMMs) of notes ranging from E3 to G5 are constructed for synthesizing singing voice. Three steps are taken in the synthesis: (1) Pitch and duration are determined according to the notes indicated by the musical score; (2) Features are sampled from appropriate HSMMs with the duration set to the maximum probability; (3) Singing voice is synthesized by the mel-log spectrum approximation (MLSA) filter using the sampled features as parameters of the filter. Emotion of a synthesized song is controlled by varying the duration and the vibrato parameters according to the Thayers mood model. Perception test is performed to evaluate the synthesized song. The results show that the algorithm can control the expressed emotion of a singing voice given a neutral singing voice database.


asian conference on computer vision | 2012

Joint kernel learning for supervised image segmentation

Jong Min Kim; Youngjoo Seo; Sanghyuk Park; Sungrack Yun; Chang D. Yoo

This paper considers a supervised image segmentation algorithm based on joint-kernelized structured prediction. In the proposed algorithm, correlation clustering over a superpixel graph is conducted using a non-linear discriminant function, where the parameters are learned by a kernelized-structured support vector machine (SSVM). For an input superpixel image, correlation clustering is used to predict the superpixel-graph edge labels that determine whether adjacent superpixel pairs should be merged or not. In previous works, the discriminant functions for structured prediction were generally chosen to be linear with the model parameter and joint feature map. However, the linear model has two limitations: complex correlations between two input-output pairs are ignored, and the joint feature map should be explicitly designed. To cope with these limitations, a nonlinear discriminant function based on a joint kernel, which eliminates the need for explicit design of the joint feature map, is considered. The proposed joint kernel is defined as a combination of an image similarity kernel and an edge-label similarity kernel, which measure the resemblance of two input images and the similarity between two edge-label pairs, respectively. Each kernel function is designed for fast computation and efficient inference. The proposed algorithm is evaluated using two segmentation benchmark datasets: the Berkeley segmentation dataset (BSDS) and Microsoft Research Cambridge dataset (MSRC). It is observed that the joint feature map implicitly embedded in the proposed joint kernel performs comparably or even better than the explicitly designed joint feature map for a linear model.


international conference on acoustics, speech, and signal processing | 2010

Largemargin training of semi-Markov model for phonetic recognition

Sungwoong Kim; Sungrack Yun; Chang D. Yoo

This paper considers a large margin training of semi-Markov model (SMM) for phonetic recognition. The SMM framework is better suited for phonetic recognition than the hidden Markov model (HMM) framework in that the SMM framework is capable of simultaneously segmenting the uttered speech into phones and labeling the segment-based features. In this paper, the SMM framework is used to define a discriminant function that is linear in the joint feature map which attempts to capture the long-range statistical dependencies within a segment and between adjacent segments of variable length. The parameters of the discriminant function are estimated by a large margin learning criterion for structured prediction. The parameter estimation problem, which is an optimization problem with many margin constraints, is solved by using a stochastic subgradient descent algorithm. The proposed large margin SMM outperforms the large margin HMM on the TIMIT corpus.


international conference on acoustics, speech, and signal processing | 2011

Learning a discriminative visual codebook using homonym scheme

SeungRyul Baek; Chang D. Yoo; Sungrack Yun

This paper studies a method for learning a discriminative visual codebook for various computer vision tasks such as image categorization and object recognition. The performance of various computer vision tasks depends on the construction of the codebook which is a table of visual-words (i.e. codewords). This paper proposed a learning criterion for constructing a discriminative codebook, and it is solved by the homonym scheme which splits codeword regions by labels. A codebook is learned based on the proposed homonym scheme such that its histogram can be used to discriminate objects of different labels. The traditional codebook based on the k-means is compared against the learned codebook on two well-known datasets (Caltech 101, ETH-80) and a dataset we constructed using google images. We show that the learned codebook consistently outperforms the traditional codebook.


international workshop on machine learning for signal processing | 2010

υ-structured support vector machines

Sungwoong Kim; Jong Min Kim; Sungrack Yun; Chang D. Yoo

This paper considers a υ-structured support vector machine (υ-SSVM) which is a structured support vector machine (SSVM) incorporating an intuitive balance parameter υ. In the absence of the parameter υ, cumbersome validation would be required in choosing the balance parameter. We theoretically prove that the parameter υ asymptotically converges to both the empirical risk of margin errors and the empirical risk of support vectors. The stochastic subgradient descent is used to solve the optimization problem of the υ-SSVM in the primal domain, since it is simple, memory efficient, and fast to converge. We verify the properties of the υ-SSVM experimentally in the task of sequential labeling handwritten characters.


international symposium on industrial electronics | 2009

Margin-enhanced maximum mutual information estimation for hidden Markov models

Sungwoong Kim; Sungrack Yun; Chang D. Yoo

A discriminative training algorithm to estimate continuous-density hidden Markov model (CDHMM) for automatic speech recognition is considered. The algorithm is based on the criterion, called margin-enhanced maximum mutual information (MEMMI), and it estimates the CDHMM parameters by maximizing the weighted sum of the maximum mutual information objective function and the large margin objective function. The MEMMI is motivated by the criterion used in such classifier as the soft margin support vector machine that maximizes the weighted sum of the empirical risk function and the margin-related generalization function. The algorithm is an iterative procedure, and at each stage, it updates the parameters by placing different weights on the utterances according to their log likelihood margins: incorrectly-classified (negative margin) utterances are emphasized more than correctly-classified utterances. The MEMMI leads to a simple objective function that can be optimized easily by a gradient ascent algorithm maintaining a probabilistic model. Experimental results show that the recognition accuracy of the MEMMI is better than other discriminative training criteria, such as the approximated maximum mutual information (AMMI), the maximum classification error (MCE), and the soft large margin estimation (SLME) on the TIDIGITS database.

Collaboration


Dive into the Sungrack Yun's collaboration.

Researchain Logo
Decentralizing Knowledge