Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yoo Rhee Oh is active.

Publication


Featured researches published by Yoo Rhee Oh.


Speech Communication | 2007

Acoustic model adaptation based on pronunciation variability analysis for non-native speech recognition

Yoo Rhee Oh; Jae Sam Yoon; Hong Kook Kim

In this paper, we investigate the pronunciation variability between native and non-native speakers and propose an acoustic model adaptation method based on the variability analysis in order to improve the performance of a non-native speech recognition system. The proposed acoustic model adaptation is performed in two steps. First, we construct baseline acoustic models from native speech, and perform phone recognition by using the baseline acoustic models to identify most informative variant phonetic units from native to non-native. Next, the acoustic model corresponding to each informative variant phonetic unit is adapted so that the state tying of the acoustic model for non-native speech reflects such a phonetic variability. For further improvement, the traditional acoustic model adaptation such as MLLR or MAP could be applied on the system that is adapted with the proposed method. In this work, we select English as a target language and non-native speakers are all Korean. It is shown from the continuous Korean-English speech recognition experiments that the proposed method can achieve the average word error rate reduction by 12.75% when compared with the speech recognition system with the baseline acoustic models trained by native speech. Moreover, the reduction of 57.12% in the average word error rate is obtained by applying MLLR or MAP adaptation to the adapted acoustic models by the proposed method


IEEE Transactions on Consumer Electronics | 2008

A name recognition based call-and-come service for home robots

Yoo Rhee Oh; Jae Sam Yoon; Ji Hun Park; Mina Kim; Hong Kook Kim

In this paper, we propose and implement an efficient robot name registration and recognition system in order to provide a call-and-come service for home robots. The service is designed to enable a robot to come to a user when its name is called correctly by the user. Therefore, techniques such as voice detection, robust speech recognition and detection of the location of the use are required. For efficient robot name registration by voice, the proposed method first restricts the search space for name registration by using monophone-based acoustic models. Then, the registration of robot names is completed using triphone-based acoustic models in the restricted search space. In order to provide a reliable service, the parameters for the robot name verification are calculated to reduce the acceptance rate of false calls. In addition, acoustic models are adapted by using a distance speech database to improve the performance of distance speech recognition. Moreover, the location of the user is estimated by using a microphone array. The experimental results for the registration and recognition of robot names show that the average word error rate (WER) of speech recognition is 1.7% in home environments, which is an acceptable WER for a real call- and-come service.


ieee automatic speech recognition and understanding workshop | 2007

Non-native pronunciation variation modeling using an indirect data driven method

Mina Kim; Yoo Rhee Oh; Hong Kook Kim

In this paper, we propose a pronunciation variation modeling method for improving the performance of a non-native automatic speech recognition (ASR) system that does not degrade the performance of a native ASR system. The proposed method is based on an indirect data-driven approach, where pronunciation variability is investigated from the training speech data, and variant rules are subsequently derived and applied to compensate for variability in the ASR pronunciation dictionary. To this end, native utterances are first recognized by using a phoneme recognizer, and then the variant phoneme patterns of native speech are obtained by aligning the recognized and reference phonetic sequences. The reference sequences are transcribed by using each of canonical, knowledge-based, and hand-labeled methods. Similar to non-native speech, the variant phoneme patterns of non-native speech can also be obtained by recognizing non-native utterances and comparing the recognized phoneme sequences and reference phonetic transcriptions. Finally, variant rules are derived from native and non-native variant phoneme patterns using decision trees and applied to the adaptation of a dictionary for non-native and native ASR systems. In this paper, Korean spoken by Chinese native speakers is considered as the non-native speech. It is shown from non-native ASR experiments that an ASR system using the dictionary constructed by the proposed pronunciation variation modeling method can relatively reduce the average word error rate (WER) by 18.5% when compared to the baseline ASR system using a canonical transcribed dictionary. In addition, the WER of a native ASR system using the proposed dictionary is also relatively reduced by 1.1%, as compared to the baseline native ASR system with a canonical constructed dictionary.


ieee automatic speech recognition and understanding workshop | 2009

MLLR/MAP adaptation using pronunciation variation for non-native speech recognition

Yoo Rhee Oh; Hong Kook Kim

In this paper, we propose an acoustic model adaptation method based on a maximum likelihood linear regression (MLLR) and a maximum a posteriori (MAP) adaptation using pronunciation variations for non-native speech recognition. To this end, we first obtain pronunciation variations using an indirect data-driven approach. Next, we generate two sets of regression classes: one composed of regression classes for all pronunciations and the other of classes for pronunciation variations. The former are referred to as overall regression classes and the latter as pronunciation variation regression classes. Next, we sequentially apply the two adaptations to non-native speech using the overall regression classes, while the acoustic models associated with the pronunciation variations are adapted using the pronunciation variation regression classes. In the final step, both sets of adapted acoustic models are merged. Thus, the resultant acoustic models can cover the characteristics of non-native speakers as well as the pronunciation variations of non-native speech. It is shown from non-native automatic speech recognition experiments for Korean spoken English continuous speech that an ASR system employing the proposed adaptation method can relatively reduce the average word error rate by 9.43% when compared to a traditional MLLR/MAP adaptation method.


international conference on acoustics, speech, and signal processing | 2008

Acoustic and pronunciation model adaptation for context-independent and context-dependent pronunciation variability of non-native speech

Yoo Rhee Oh; Mina Kim; Hong Kook Kim

In this paper, we propose an acoustic and pronunciation model adaptation method for context-independent (CI) and context-dependent (CD) pronunciation variability to improve the performance of a non-native automatic speech recognition (ASR) system. The proposed adaptation method is performed in three steps. First, we perform phone recognition to obtain an n-best list of phoneme sequences and derive pronunciation variant rules by using a decision tree. Second, the pronunciation variant rules are decomposed into CI and CD pronunciation variation on the basis of context dependency. That is, some pronunciation variant rules that are dedicated to the specific phoneme sequences is classified into CI pronunciation variation, but others are classified into CD one. It is assumed here that CI and CD pronunciation variabilities are invoked by a different pronunciation space from the mother tongue of a non-native speaker and the coarticulation effects in a context, respectively. Third, the acoustic model adaptation is performed in a state-tying step for the CI pronunciation variability from an indirect data-driven method. In addition, the pronunciation model adaptation is completed by constructing a multiple pronunciation dictionary using the CD pronunciation variability. It is shown from the continuous Korean-English ASR experiments that the proposed method can reduce the average word error rate (WER) by 16.02% when compared with the baseline ASR system that is trained by native speech. Moreover, an ASR system using the proposed method provides average WER reductions of 8.95% and 3.67% when compared to the only acoustic model adaptation and the only pronunciation model adaptation, respectively.


international conference on acoustics, speech, and signal processing | 2006

Acoustic Model Adaptation Based on Pronunciation Variability Analysis for Non-Native Speech Recognition

Yoo Rhee Oh; Jae Sam Yoon; Hong Kook Kim

In this paper, we investigate the pronunciation variability between native and non-native speakers and propose an acoustic model adaptation method based on the variability analysis in order to improve the performance of a non-native speech recognition system. The proposed acoustic model adaptation is performed in two steps. First, we construct baseline acoustic models from native speech, and perform phone recognition by using the baseline acoustic models to identify most informative variant phonetic units from native to non-native. Next, the acoustic model corresponding to each informative variant phonetic unit is adapted so that the state tying of the acoustic model for non-native speech reflects such a phonetic variability. For further improvement, the traditional acoustic model adaptation such as MLLR or MAP could be applied on the system that is adapted with the proposed method. In this work, we select English as a target language and non-native speakers are all Korean. It is shown from the continuous Korean-English speech recognition experiments that the proposed method can achieve the average word error rate reduction by 12.75% when compared with the speech recognition system with the baseline acoustic models trained by native speech. Moreover, the reduction of 57.12% in the average word error rate is obtained by applying MLLR or MAP adaptation to the adapted acoustic models by the proposed method.


IEEE Transactions on Consumer Electronics | 2009

A voice-driven scene-mode recommendation service for portable digital imaging devices

Yoo Rhee Oh; Jae Sam Yoon; Hong Kook Kim; Myung Bo Kim; Sang Ryong Kim

In this paper, we propose a voice-driven scenemode recommendation service in order to more easily select scene-modes on portable digital imaging devices such as digital cameras and camcorders. In other words, the proposed service is designed to recommend or automatically change the scene-mode by recognizing a user?s voice command regarding scene or scene-related words. To realize such a service, we implement a system which is mainly composed of voice activity detection, automatic speech recognition (ASR), utterance verification, and word-to-scene-mode mapping. However, several optimization methods should be applied since portable digital imaging devices operate on embedded systems with limited resources. In addition, a speech adaptation database for acoustic models is developed such that the ASR system can adjust to the characteristics of the microphones and operating environments. Finally, the performance of the voice-driven scene-mode recommendation system is measured in terms of processing time and scenemode recognition accuracy (SMRA). It is shown from the experiments that the average processing time and the average SMRA are around 500 ms and 98.0% for 50 scene-related words, respectively, and 1200 ms and 96.8% for 200 scenerelated words.1


international conference on acoustics, speech, and signal processing | 2010

On the use of feature-space MLLR adaptation for non-native speech recognition

Yoo Rhee Oh; Hong Kook Kim

In this paper, we address issues associated with a feature-space maximum likelihood linear regression (fMLLR) adaptation method applied to non-native speech recognition. In particular, fMLLR smoothing is proposed here to compensate for mismatches between adaptation and test data, caused by the various disfluencies of non-native speakers. The proposed fMLLR smoothing is performed with a Viterbi decoding procedure and implemented at two levels: a Gaussian mixture probability density function (mpdf) level and an observation probability density function (opdf) level. The mpdf-level smoothing is performed by comparing the pdf of each Gaussian mixture component of an original speech feature vector with that transformed by the fMLLR. On the other hand, the opdf-level smoothing compares the Gaussian mixture probabilities between the original and its fMLLR transformed feature vectors. It is shown from non-native automatic speech recognition experiments on a Korean-spoken English continuous speech corpus that an ASR system employing the proposed mpdf-level and opdf-level fMLLR smoothing methods can relatively reduce the average word error rate by 30.65% and 29.82%, respectively, when compared to a traditional fMLLR adaptation method.


Archive | 2010

Non-native Pronunciation Variation Modeling for Automatic Speech Recognition

Mina Kim; Yoo Rhee Oh; Hong Kook Kim

Communication using speech is inherently natural, with this ability of communication unconsciously acquired in a step-by-step manner throughout life. In order to explore the benefits of speech communication in devices, there have been many research works performed over the past several decades. As a result, automatic speech recognition (ASR) systems have been deployed in a range of applications, including automatic reservation systems, dictation systems, navigation systems, etc. Due to increasing globalization, the need for effective interlingual communication has also been growing. However, because of the fact that most people tend to speak foreign languages with variant or influent pronunciations, this has led to an increasing demand for the development of non-native ASR systems (Goronzy et al., 2001). In other words, a conventional ASR system is optimized with native speech; however, non-native speech has different characteristics from native speech. That is, non-native speech tends to reflect the pronunciations or syntactic characteristics of the mother tongue of the non-native speakers, as well as the wide range of fluencies among non-native speakers. Therefore, the performance of an ASR system evaluated using non-native speech tends to severely degrade when compared to that of native speech due to the mismatch between the native training data and the nonnative test data (Compernolle, 2001). A simple way to improve the performance of an ASR system for non-native speech would be to train the ASR system using a non-native speech database, though in reality the number of non-native speech samples available for this task is not currently sufficient to train an ASR system. Thus, techniques for improving non-native ASR performance using only small amount of non-native speech are required. There have been three major approaches for handling non-native speech for ASR: acoustic modeling, language modeling, and pronunciation modeling approaches. First, acoustic modeling approaches find pronunciation differences and transform and/or adapt acoustic models to include the effects of non-native speech (Gruhn et al., 2004; Morgan, 2004; Steidl et al., 2004). Second, language modeling approaches deal with the grammatical effects or speaking style of non-native speech (Bellegarda, 2001). Third, pronunciation modeling approaches derive pronunciation variant rules from non-native speech and apply the derived rules to pronunciation models for non-native speech (Amdal et al., 2000; FoslerLussier, 1999; Goronzy et al., 2004; Gruhn et al., 2004; Raux, 2004; Strik et al., 1999). Source: Advances in Speech Recognition, Book edited by: Noam R. Shabtai, ISBN 978-953-307-097-1, pp. 164, September 2010, Sciyo, Croatia, downloaded from SCIYO.COM


international symposium on intelligent signal processing and communication systems | 2004

Design of a speech coder utilizing speech recognition parameters for server-based wireless speech recognition

Jae Sam Yoon; Yoo Rhee Oh; Hong Kook Kim

The existing standard speech coders have good speech communications quality; however, it is known that they degrade the performance of automatic speech recognition (ASR) systems. that are deployed for wireless communications as a server-based approach. The paper proposes a speech coder that utilizes speech recognition parameters for wireless speech recognition. To maintain the performance of ASR as in conventional ASR systems implemented in a client-based approach, the proposed speech coder first extracts Mel-frequency cepstral coefficients (MFCC) that are the typical recognition parameters, and then converts them into linear prediction coefficients for CELP-type speech coding. By transmitting MFCC directly to the decoder, an ASR system employing the proposed speech coder can provide even better performance than that using standard CELP-type speech coders.

Collaboration


Dive into the Yoo Rhee Oh's collaboration.

Top Co-Authors

Avatar

Hong Kook Kim

Gwangju Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Jae Sam Yoon

Gwangju Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Mina Kim

Gwangju Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Choong Sang Cho

Gwangju Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Hyun Joo Bae

Electronics and Telecommunications Research Institute

View shared research outputs
Top Co-Authors

Avatar

Ji Hun Park

Gwangju Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mi Suk Lee

Electronics and Telecommunications Research Institute

View shared research outputs
Top Co-Authors

Avatar

Min Ji Lee

Gwangju Institute of Science and Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge