Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yongwon Jeong is active.

Publication


Featured researches published by Yongwon Jeong.


IEEE Signal Processing Letters | 2010

New Speaker Adaptation Method Using 2-D PCA

Yongwon Jeong; Hyung Soon Kim

This letter describes a speaker adaptation method based on the two-dimensional PCA of training models. In the method, state and dimension of mean vectors are differentiated, and the covariance matrix is computed dimension-wisely. As a result, the speaker weight can contain different weighting for each dimension of mean vectors. In the isolated-word recognition experiments, the proposed method performed better than both eigenvoice and MLLR, for adaptation data longer than about 15 seconds, due to its more elaborate modeling. The method can also be applied to other PCA-based modeling methods where each training model can be represented as a matrix.


international conference on acoustics, speech, and signal processing | 2010

Speaker adaptation based on the multilinear decomposition of training speaker models

Yongwon Jeong

This paper presents a novel speaker adaptation method based on the multilinear analysis of training speakers using Tucker decomposition. A Tucker decomposition of training models can decouple the dataset into the subspaces of state, dimension of the mean vector, and speaker. Using the bases of the state subspace, we derive a speaker adaptation formula where the matrix of basis vectors is weighted in row and column spaces; the proposed method can include the eigenvoice technique as a subset. The results from the isolated-word recognition task showed that the Tucker decomposition-based method outperformed both eigenvoice and MLLR for the adaptation data whose lengths are 15 seconds or longer. Furthermore, the method can easily be extended to multi-factor problems, thus enabling the adaptation of multiple factors such as speaker and noise environment.


Speech Communication | 2014

Joint speaker and environment adaptation using TensorVoice for robust speech recognition

Yongwon Jeong

We present an adaptation of a hidden Markov model (HMM)-based automatic speech recognition system to the target speaker and noise environment. Given HMMs built from various speakers and noise conditions, we build tensorvoices that capture the interaction between the speaker and noise by using a tensor decomposition. We express the updated model for the target speaker and noise environment as a product of the tensorvoices and two weight vectors, one each for the speaker and noise. An iterative algorithm is presented to determine the weight vectors in the maximum likelihood (ML) framework. With the use of separate weight vectors, the tensorvoice approach can adapt to the target speaker and noise environment differentially, whereas the eigenvoice approach, which is based on a matrix decomposition technique, cannot differentially adapt to those two factors. In supervised adaptation tests using the AURORA4 corpus, the relative improvement of performance obtained by the tensorvoice method over the eigenvoice method is approximately 10% on average for adaptation data of 6-24s in length, and the relative improvement of performance obtained by the tensorvoice method over the maximum likelihood linear regression (MLLR) method is approximately 5.4% on average for adaptation data of 6-18s in length. Therefore, the tensorvoice approach is an efficient method for speaker and noise adaptation.


international conference on acoustics, speech, and signal processing | 2009

A new method for speaker adaptation using bilinear model

Hwa Jeon Song; Yongwon Jeong; Hyung Soon Kim

In this paper, a novel method for speaker adaptation using bilinear model is proposed. Bilinear model can express both characteristics of speakers (style) and phonemes across speakers (content) independently in a training database. The mapping from each speaker and phoneme space to observation space is carried out using bilinear mapping matrix which is independent of speaker and phoneme space. We apply the bilinear model to speaker adaption. Using adaptation data from a new speaker, speaker-adapted model is built by estimating the style(speaker)-specific matrix. Experimental results showed that the proposed method outperformed eigenvoice and MLLR. In vocabulary-independent isolated word recognition for speaker adaptation, bilinear model reduced word error rate by about 38% and about 10% compared to eigenvoice and MLLR respectively using 50 words for adaptation.


IEEE Transactions on Audio, Speech, and Language Processing | 2012

Adaptation of Hidden Markov Models Using Model-as-Matrix Representation

Yongwon Jeong

In this paper, we describe basis-based speaker adaptation techniques using the matrix representation of training models. Bases are obtained from training models by decomposition techniques for matrix-variate objects: two-dimensional principal component analysis (2DPCA) and generalized low rank approximations of matrices (GLRAM). The motivation for using matrix representation is that the sample covariance matrix of training models can be more accurately computed and the speaker weight becomes a matrix. Speaker adaptation equations are derived in the maximum-likelihood (ML) framework, and the adaptation equations can be solved using the maximum-likelihood linear regression technique. Additionally, novel applications of probabilistic 2DPCA and GLRAM to speaker adaptation are presented. From the probabilistic 2DPCA/GLRAM of training models, speaker adaptation equations are formulated in the maximum a posteriori (MAP) framework. The adaptation equations can be solved using the MAP linear regression technique. In the isolated-word experiments, the matrix representation-based methods in the ML and MAP frameworks outperformed maximum-likelihood linear regression adaptation, MAP adaptation, eigenvoice, and probabilistic PCA-based model for adaptation data longer than 20 s. Furthermore, the adaptation methods using probabilistic 2DPCA/GLRAM showed additional performance improvement over the adaptation methods using 2DPCA/GLRAM for small amounts of adaptation data.


IEEE Signal Processing Letters | 2011

Acoustic Model Adaptation Based on Tensor Analysis of Training Models

Yongwon Jeong

We present a tensor analysis of acoustic models comprising various speakers in multiple noise conditions, and its application to the new speaker and environment adaptation for speech recognition. The bases used in adaptation are constructed by decomposing the training models in the state, feature dimension, speaker, and noise spaces using multilinear singular value decomposition. The isolated-word recognition experiment demonstrated the effectiveness of the proposed method, showing better performance than eigenvoice in the babble and factory floor noises for the adaptation data longer than approximately 20 s.


Speech Communication | 2013

Unified framework for basis-based speaker adaptation based on sample covariance matrix of variable dimension

Yongwon Jeong

We present a unified framework for basis-based speaker adaptation techniques, which subsumes eigenvoice speaker adaptation using principal component analysis (PCA) and speaker adaptation using two-dimensional PCA (2DPCA). The basic idea is to partition a Gaussian mean vector of a hidden Markov model (HMM) for each state and mixture component into a group of subvectors and stack all the subvectors of a training speaker model into a matrix. The dimension of the matrix varies according to the dimension of the subvector. As a result, the basis vectors derived from the PCA of training model matrices have variable dimension and so does the speaker weight in the adaptation equation. When the amount of adaptation data is small, adaptation using the speaker weight of small dimension with the basis vectors of large dimension can give good performance, whereas when the amount of adaptation data is large, adaptation using the speaker weight of large dimension with the basis vectors of small dimension can give good performance. In the experimental results, when the dimension of basis vectors was chosen between those of the eigenvoice method and the 2DPCA-based method, the model showed the balanced performance between the eigenvoice method and the 2DPCA-based method.


signal processing systems | 2016

Basis-Based Speaker Adaptation Using Partitioned HMM Mean Parameters of Training Speaker Models

Yongwon Jeong

This paper presents the basis-based speaker adaptation method that includes approaches using principal component analysis (PCA) and two-dimensional PCA (2DPCA). The proposed method partitions the hidden Markov model (HMM) mean vectors of training models into subvectors of smaller dimension. Consequently, the sample covariance matrix computed using the partitioned HMM mean vectors has various dimensions according to the dimension of the subvectors. From the eigen-decomposition of the sample covariance matrix, basis vectors are constructed. Thus, the dimension of basis vectors varies according to the dimension of the sample covariance matrix, and the proposed method includes PCA and 2DPCA-based approaches. We present the adaptation equation in both the maximum likelihood (ML) and maximum a posteriori (MAP) frameworks. We perform continuous speech recognition experiments using the Wall Street Journal (WSJ) corpus. The results show that the model with basis vectors whose dimensions are between those of PCA and 2DPCA-based approaches shows good overall performance. The proposed approach in the MAP framework shows additional performance improvement over the ML counterpart when the number of adaptation parameters is large but the amount of available adaptation data is small. Furthermore, the performance of the approach in the MAP framework approach is less sensitive to the choice of model order than the ML counterpart.


Eurasip Journal on Audio, Speech, and Music Processing | 2013

Speaker adaptation in the maximum a posteriori framework based on the probabilistic 2-mode analysis of training models

Yongwon Jeong

In this article, we describe a speaker adaptation method based on the probabilistic 2-mode analysis of training models. Probabilistic 2-mode analysis is a probabilistic extension of multilinear analysis. We apply probabilistic 2-mode analysis to speaker adaptation by representing each of the hidden Markov model mean vectors of training speakers as a matrix, and derive the speaker adaptation equation in the maximum a posteriori (MAP) framework. The adaptation equation becomes similar to the speaker adaptation equation using the MAP linear regression adaptation. In the experiments, the adapted models based on probabilistic 2-mode analysis showed performance improvement over the adapted models based on Tucker decomposition, which is a representative multilinear decomposition technique, for small amounts of adaptation data while maintaining good performance for large amounts of adaptation data.


Electronics Letters | 2011

Robust speaker adaptation based on parallel factor analysis of training models

Yongwon Jeong

Collaboration


Dive into the Yongwon Jeong's collaboration.

Top Co-Authors

Avatar

Hyung Soon Kim

Pusan National University

View shared research outputs
Top Co-Authors

Avatar

Sunchan Park

Pusan National University

View shared research outputs
Top Co-Authors

Avatar

Hwa Jeon Song

Pusan National University

View shared research outputs
Top Co-Authors

Avatar

Min Sik Kim

Pusan National University

View shared research outputs
Top Co-Authors

Avatar

Young Kuk Kim

Pusan National University

View shared research outputs
Top Co-Authors

Avatar

S.P. Yi

Pusan National University

View shared research outputs
Top Co-Authors

Avatar

Sangjun Lim

Pusan National University

View shared research outputs
Top Co-Authors

Avatar

Sung Joo Lee

Electronics and Telecommunications Research Institute

View shared research outputs
Researchain Logo
Decentralizing Knowledge