Yeon-Jun Kim
KAIST
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yeon-Jun Kim.
international conference on spoken language processing | 1996
Yeon-Jun Kim; Yung-Hwan Oh
The paper proposes a model for predicting the prosodic phrase boundaries of speech with variable speaking rates. Speakers can produce a sentence in several ways without altering its meaning or naturalness, i.e., a sequence of words can have a number of prosodic phrase boundaries. There are many factors which influence the variability of prosodic phrasing, such as syntactic structure, focus, speaker differences, speaking rate and the need to breathe. We adopt dependency grammar, similar to link grammar, to efficiently combine speaking rates. The proposed model reduced prosodic phrase boundary prediction error by 20% compared to the model using only syntactic information. We show a potential way to make use of a read speech corpus in the training of prosodic phrasing for spontaneous speech. The proposed model is expected to make synthesized speech more natural and improve the robustness of spontaneous speech recognition.
international conference on acoustics, speech, and signal processing | 2015
Taniya Mishra; Yeon-Jun Kim; Srinivas Bangalore
Intonational phrase (IP) break prediction is an important aspect of front-end analysis in a text-to-speech system. Standard approaches for intonational phrase break prediction rely on the use of linguistic rules or more recently, lexicalized data-driven models. Linguistic rules are not robust while data-driven models based on lexical identity do not generalize across domains. To overcome these challenges, in this paper, we explore the use of syntactic features to predict intonational phrase breaks. On a test set of over 40 thousand words, while a lexically driven IP break prediction model yields an F-score of 0.82, a non-lexicalized model that uses part-of-speech tags and dependency relations achieves an F-score of 0.81 with added feature of being more portable across domains. In this work, we also examine the effect of contextual information on prediction performance. Our evaluation shows that using a three-token left context in a POS-tag based model results in only a 2% drop in recall compared to a model that uses both a left and right context, which suggests the viability of using such a model for incremental text-to-speech system.
international conference on acoustics, speech, and signal processing | 1997
Heo-Jin Byeon; Yeon-Jun Kim; Kung-Hwan Oh
This paper introduces an F0 contour generation method for text-to-speech synthesis using stochastic mapping and vector quantization control parameters. This model uses a new F0 contour labelling scheme based on the RFC (rise/fall/connection) model, which describes F0 contour patterns with seven F0 labels and three pause labels. This paper also suggests an efficient selection method for control parameters instead of using the mean values of the control parameters. We achieved a 78.06% accuracy in the F0 label prediction and a 95.87% accuracy in the pause label prediction using this model. The experimental results shows that synthesized speech using vector quantization control parameters is more natural than using the mean values of the feature parameters.
conference of the international speech communication association | 1999
Yeon-Jun Kim; Heo-Jin Byeon; Yung-Hwan Oh
SSW | 2010
Ann K. Syrdal; Alistair Conkie; Yeon-Jun Kim; Mark C. Beutnagel
conference of the international speech communication association | 2010
Yeon-Jun Kim; Marc C. Beutnagel
conference of the international speech communication association | 2011
Yeon-Jun Kim; Thomas Okken; Alistair Conkie; Giuseppe Di Fabbrizio
conference of the international speech communication association | 2018
Andreas Søeborg Kirkedal; Yeon-Jun Kim
Archive | 2003
Alistair Conkie; Yeon-Jun Kim
Archive | 2003
Alistair Conkie; Yeon-Jun Kim