Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yung-Hwan Oh is active.

Publication


Featured researches published by Yung-Hwan Oh.


IEEE Signal Processing Letters | 2003

Single-channel signal separation using time-domain basis functions

Gil-Jin Jang; Te-Won Lee; Yung-Hwan Oh

We present a new technique for achieving blind source separation when given only a single-channel recording. The main idea is based on exploiting the inherent time structure of sound sources by learning a priori sets of time-domain basis functions that encode the sources in a statistically efficient manner. We derive a learning algorithm using a maximum likelihood approach given the observed single-channel data and sets of basis functions. For each time point, we infer the source parameters and their contribution factors using a flexible but simple density model. We show the separation results of two music signals as well as the separation of two voice signals.


international conference on acoustics, speech, and signal processing | 2001

Learning statistically efficient features for speaker recognition

Gil-Jin Jang; Te-Won Lee; Yung-Hwan Oh

We apply independent component analysis for extracting an optimal basis to the problem of finding efficient features for a speaker. The basis functions learned by the algorithm are oriented and localized in both space and frequency, bearing a resemblance to Gabor functions. The speech segments are assumed to be generated by a linear combination of the basis functions, thus the distribution of speech segments of a speaker is modeled by a basis, which is calculated so that each component should be independent upon others on the given training data. The speaker distribution is modeled by the basis functions. To assess the efficiency of the basis functions, we performed speaker classification experiments and compared our results with the conventional Fourier-basis. Our results show that the proposed method is more efficient than the conventional Fourier-based features, in that they can obtain a higher classification rate.


international conference on spoken language processing | 1996

Lombard effect compensation and noise suppression for noisy Lombard speech recognition

Sang-Mun Chi; Yung-Hwan Oh

The performance of a speech recognition system degrades rapidly in the presence of ambient noise. To reduce the degradation, a degradation model is proposed which represents the spectral changes in a speech signal uttered in a noisy environment. The model uses frequency warping and amplitude scaling of each frequency band to simulate the variations of formant location, formant bandwidth, pitch, spectral tilt and energy in each frequency band by the Lombard effect. Another Lombard effect-the variation of overall vocal intensity-is represented by a multiplicative constant term depending on the spectral magnitude of the input speech. The noise contamination is represented by an additive term in the frequency domain. According to this degradation model, the cepstral vector of clean speech is estimated from that of noisy-Lombard speech using spectral subtraction, spectral magnitude normalization, band-pass filtering in the Lin-Log spectral domain, and multiple linear transformations. Noisy Lombard speech data is collected by simulating noisy environments using noises from automobiles, an exhibition hall, telephone booths in downtown crowded streets, and computer rooms. The proposed method significantly reduces error rates in the recognition of 50 Korean words. For example, the recognition rate is 95.91% with this method and 79.68% without this method at an SNR (signal-to-noise ratio) 10 dB.


IEEE Signal Processing Letters | 2004

On the use of channel-attentive MFCC for robust recognition of partially corrupted speech

Hoon-Young Cho; Yung-Hwan Oh

This letter proposes a channel-attentive mel frequency cepstral coefficient (CAMFCC) method to improve the utilization of uncorrupted or more reliable frequency bands for robust speech recognition. This method obtains a channel attention matrix by reliability estimation of mel filter bank channels, and both the input mel frequency cepstral coefficients and the mean vectors of hidden Markov models are corrected using the channel attention matrix at the output probability calculation of the Viterbi decoding. Experimental results on the TIDIGITS database corrupted by various band-selective noises indicated that the proposed CAMFCC method utilizes the uncorrupted partial frequency bands better than a multiband method, resolving the limitation of noise localization caused by the fixed boundaries of the multiband approach.


Cognitive Computation | 2012

Speaker-Characterized Emotion Recognition using Online and Iterative Speaker Adaptation

Jaebok Kim; Jeong-Sik Park; Yung-Hwan Oh

This paper proposes a novel speech emotion recognition (SER) framework for affective interaction between human and personal devices. Most of the conventional SER techniques adopt a speaker-independent model framework because of the sparseness of individual speech data. However, a large amount of individual data can be accumulated on a personal device, making it possible to construct speaker-characterized emotion models in accordance with a speaker adaptation procedure. In this study, to address problems associated with conventional adaptation approaches in SER tasks, we modified a representative adaptation technique, maximum likelihood linear regression (MLLR), on the basis of selective label refinement. We subsequently carried out the modified MLLR procedure in an online and iterative manner, using accumulated individual data, to further enhance the speaker-characterized emotion models. In the SER experiments based on an emotional corpus, our approach exhibited performance superior to that of conventional adaptation techniques as well as the speaker-independent model framework.


international conference on acoustics, speech, and signal processing | 2011

On-line speaker adaptation based emotion recognition using incremental emotional information

Jae-Bok Kim; Jeong-Sik Park; Yung-Hwan Oh

This paper proposes a new Speech Emotion Recognition (SER) framework. Compared to the speaker-independent emotion models, speaker-adapted models constructed by using a speakers emotional speech data can represent the speakers emotional characteristics more precisely, thus improving SER accuracy. However, it is hard to collect a sufficient amount of personal emotional data at once. For this reason, we propose an MLLR-based online speaker adaptation technique using accumulated personal data. Compared to speech models, it is relatively difficult to construct reliable emotion models applicable to MLLR due to the domain-oriented characteristics. Thus, we modify the conventional MLLR procedure by using selective label refinement, which categorizes newly accumulated adaptation data into discriminative and non-discriminative data, and only refines the labels of the discriminative data. On SER experiments based on an LDC emotion corpus, our approach exhibited superior performance when compared to conventional adaptation techniques as well as the speaker-independent model framework.1


IEEE Signal Processing Letters | 2002

Multicodebook split vector quantization of LSF parameters

Woo-Jin Han; Eun-Kyoung Kim; Yung-Hwan Oh

A multicodebook quantization method for improving the performance of the split vector quantization (SVQ) is described. In the proposed method, multiple codebooks having different sizes are trained, and the minimal-size codebook and its codeword satisfying the condition based on the spectral distortion are determined. Several clustering techniques for reducing total bit rate and the analysis of the computational complexity are also presented. Experimental results have shown that the proposed method reduces the number of outliers significantly and achieves a better rate-distortion performance compared with the fixed-bit-rate SVQ, multistage VQ, and variable-bit-rate SVQ.


IEEE Signal Processing Letters | 2001

A new band-splitting method for two-band speech model

Eun-Kyoung Kim; Woo-Jin Han; Yung-Hwan Oh

A new band-splitting method for two-band speech model is proposed, whereby a score function based on the subband periodicity is calculated for each harmonic band and the band of maximum score is selected as the band-splitting frequency. Using the score function, a tracking technique is applied to the algorithm for the continuity of the band-splitting frequencies between neighboring frames. This method has proven to be robust to the inaccuracy of pitch estimation and the frequency resolution problem in low-pitched speech.


international conference on spoken language processing | 1996

Prediction of prosodic phrase boundaries considering variable speaking rate

Yeon-Jun Kim; Yung-Hwan Oh

The paper proposes a model for predicting the prosodic phrase boundaries of speech with variable speaking rates. Speakers can produce a sentence in several ways without altering its meaning or naturalness, i.e., a sequence of words can have a number of prosodic phrase boundaries. There are many factors which influence the variability of prosodic phrasing, such as syntactic structure, focus, speaker differences, speaking rate and the need to breathe. We adopt dependency grammar, similar to link grammar, to efficiently combine speaking rates. The proposed model reduced prosodic phrase boundary prediction error by 20% compared to the model using only syntactic information. We show a potential way to make use of a read speech corpus in the training of prosodic phrasing for spontaneous speech. The proposed model is expected to make synthesized speech more natural and improve the robustness of spontaneous speech recognition.


international conference on spoken language processing | 1996

A text analyzer for Korean text-to-speech systems

Sangho lee; Yung-Hwan Oh

In developing a text-to-speech system, it is well known that the accuracy of information extracted from a text is crucial to produce high quality synthesized speech. By transferring probabilistic natural language processing techniques into TTS system field, we develop a more robust text analyzer with high accuracy for Korean TTS systems. The proposed system is composed of five modules: a preprocessor, a morphological analyzer, a part-of-speech tagger, a grapheme-to-phoneme module and a parser. Among these modules, the part-of-speech tagger and the parser are designed under probabilistic framework, and trained automatically. Given a text, our system produces the structures of word phrases, word pronunciations, and governor dependent relationships that represents the structure of the sentence. Experimental results showed that the tagger got 90.33% correctness for finding the structure of word phrases in the word level, and the parser, 80.87% for finding governor dependent relationships of sentences respectively.

Collaboration


Dive into the Yung-Hwan Oh's collaboration.

Researchain Logo
Decentralizing Knowledge