Doo Hwa Hong
Seoul National University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Doo Hwa Hong.
IEEE Transactions on Audio, Speech, and Language Processing | 2012
Nam Soo Kim; Tae Gyoon Kang; Shin Jae Kang; Chang Woo Han; Doo Hwa Hong
Signals originated from the same speech source usually appear differently depending on a variety of acoustic effects such as the background noises, linear or nonlinear distortions incurred by the recording devices or reverberations. These acoustical effects result in mismatches between the trained speech recognition models and the input speech. One of the well-known approaches to reduce this mismatch is to map the distorted speech feature to its clean counterpart. The mapping function is usually trained based on a set of stereo data which consists of the simultaneous recordings obtained in both the reference and target conditions. In this paper, we propose the switching linear dynamic system (SLDS) as a useful model for speech feature sequence mapping. In contrast to the conventional vector-to-vector mapping algorithms, SLDS can describe sequence-to-sequence mapping in a systematic way. The proposed approach is applied to robust speech recognition in various environmental conditions and shows a dramatic improvement in recognition performance.
IEEE Signal Processing Letters | 2011
Nam Soo Kim; June Sig Sung; Doo Hwa Hong
One of the most popular approaches to parameter adaptation in hidden Markov model (HMM) based systems is the maximum likelihood linear regression (MLLR) technique. In this letter, we extend MLLR to factored MLLR (FMLLR) in which the MLLR parameters depend on a continuous-valued control vector. Since it is practically impossible to estimate the MLLR parameters for each control vector separately, we propose a compact parametric form of the MLLR parameters. In the proposed approach, each MLLR parameter is represented as an inner product between a regression vector and transformed control vector. We present an algorithm to train the FMLLR parameters based on a general framework of the expectation-maximization (EM) algorithm. The proposed approach is applied to adapt the HMM parameters obtained from a database of reading-style speech to singing-style voices while treating the pitches and durations extracted from the musical notes as the control vectors. This enables to efficiently construct a singing voice synthesizer with only a small amount of singing data.
international conference on acoustics, speech, and signal processing | 2011
Chang Woo Han; Tae Gyoon Kang; Doo Hwa Hong; Nam Soo Kim; Kiwan Eom; Jae-won Lee
The performance of a speech recognition system may be degraded even without any background noise because of the linear or non-linear distortions incurred by recording devices or reverberations. One of the well-known approaches to reduce this channel distortion is feature mapping which maps the distorted speech feature to its clean counterpart. The feature mapping rule is usually trained based on a set of stereo data which consists of the simultaneous recordings obtained in both the reference and target conditions. In this paper, we propose a novel approach to speech feature sequence mapping based on the switching linear dynamic transducer (SLDT). The proposed algorithm enables us a sequence-to-sequence mapping in a systematic way, instead of the traditional vector-to-vector mapping. The proposed approach is applied to compensate channel distortion in speech recognition and shows improvement in recognition performance.
IEEE Journal of Selected Topics in Signal Processing | 2014
June Sig Sung; Doo Hwa Hong; Nam Soo Kim
Speech synthesized from the same text should sound differently depending on the speaking style. Current speech synthesis techniques based on the hidden Markov model (HMM) usually focus on a fixed speaking style and changing the speaking style requires a variety of sets of parameters trained in different speaking styles. A promising alternative is to adapt the base model to the intended speaking style. In our previous work, we proposed factored maximum likelihood linear regression (FMLLR) adaptation where each MLLR parameter is defined as a function of a control vector. We presented a method to train the FMLLR parameters based on a general framework of the expectation-maximization (EM) algorithm. In this paper, we introduce a novel technique called factored maximum penalized likelihood kernel regression (FMLKR) for HMM-based style adaptive speech synthesis. In FMLKR, nonlinear regression between the mean vector of the base model and the corresponding mean vectors of the adaptation data is performed with the use of kernel method based on the FMLLR framework. In a series of experiments on artificial generation of singing voice and expressive speech, we evaluate the performance of the FMLLR and FMLKR techniques with various matrix structures and also compare with other approaches to parameter adaptation in HMM-based speech synthesis.
european signal processing conference | 2015
Doo Hwa Hong; Joun Yeop Lee; Nam Soo Kim
In this paper, we propose a relevance vector machine (RVM) for modeling and generation of a speech feature sequence. In the conventional method, the mean parameter of the hidden Markov model (HMM) state can not consider temporal correlation among corresponding data frames. Since the RVM can be utilized to solve a nonlinear regression problem, we apply it to replace the model parameters of the state output distributions. In the proposed system, RVMs are employed to model the statistically representative process of the state or phone segment which is obtained from normalized training feature sequences by using the semi-parametric nonlinear regression method. We conducted comparative experiments for the proposed RVMs with conventional HMM. It is shown that the proposed state-level RVM-based method performed better than the conventional technique.
intelligent information hiding and multimedia signal processing | 2014
Doo Hwa Hong; Shin Jae Kang; Joun Yeop Lee; Nam Soo Kim
The maximum likelihood linear regression (MLLR) technique is a well-known approach to parameter adaptation in hidden Markov model (HMM)-based systems. In this paper, we propose the maximum penalized likelihood kernel regression (MPLKR) approach as a novel adaptation technique for HMM-based speech synthesis. The proposed algorithm performs a nonlinear regression between the mean vector of the base model and the corresponding mean vector of adaptive data by means of a kernel method. In the experiments, we used various types of parametric kernels for the proposed algorithm and compared their performances with the conventional method. From experimental results, it has been found that the proposed algorithm outperforms the conventional method in terms of the objective measure as well as the subjective listening quality.
conference of the international speech communication association | 2010
June Sig Sung; Doo Hwa Hong; Kyung Hwan Oh; Nam Soo Kim
IEICE Transactions on Information and Systems | 2013
June Sig Sung; Doo Hwa Hong; Hyun Woo Koo; Nam Soo Kim
conference of the international speech communication association | 2012
June Sig Sung; Doo Hwa Hong; Hyun Woo Koo; Nam Soo Kim
conference of the international speech communication association | 2011
June Sig Sung; Doo Hwa Hong; Shin Jae Kang; Nam Soo Kim