Tae Gyoon Kang
Seoul National University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Tae Gyoon Kang.
IEEE Signal Processing Letters | 2015
Tae Gyoon Kang; Kisoo Kwon; Jong Won Shin; Nam Soo Kim
Non-negative matrix factorization (NMF) is one of the most well-known techniques that are applied to separate a desired source from mixture data. In the NMF framework, a collection of data is factorized into a basis matrix and an encoding matrix. The basis matrix for mixture data is usually constructed by augmenting the basis matrices for independent sources. However, target source separation with the concatenated basis matrix turns out to be problematic if there exists some overlap between the subspaces that the bases for the individual sources span. In this letter, we propose a novel approach to improve encoding vector estimation for target signal extraction. Estimating encoding vectors from the mixture data is viewed as a regression problem and a deep neural network (DNN) is used to learn the mapping between the mixture data and the corresponding encoding vectors. To demonstrate the performance of the proposed algorithm, experiments were conducted in the speech enhancement task. The experimental results show that the proposed algorithm outperforms the conventional encoding vector estimation scheme.
IEEE Transactions on Audio, Speech, and Language Processing | 2012
Nam Soo Kim; Tae Gyoon Kang; Shin Jae Kang; Chang Woo Han; Doo Hwa Hong
Signals originated from the same speech source usually appear differently depending on a variety of acoustic effects such as the background noises, linear or nonlinear distortions incurred by the recording devices or reverberations. These acoustical effects result in mismatches between the trained speech recognition models and the input speech. One of the well-known approaches to reduce this mismatch is to map the distorted speech feature to its clean counterpart. The mapping function is usually trained based on a set of stereo data which consists of the simultaneous recordings obtained in both the reference and target conditions. In this paper, we propose the switching linear dynamic system (SLDS) as a useful model for speech feature sequence mapping. In contrast to the conventional vector-to-vector mapping algorithms, SLDS can describe sequence-to-sequence mapping in a systematic way. The proposed approach is applied to robust speech recognition in various environmental conditions and shows a dramatic improvement in recognition performance.
international conference on acoustics, speech, and signal processing | 2011
Chang Woo Han; Tae Gyoon Kang; Doo Hwa Hong; Nam Soo Kim; Kiwan Eom; Jae-won Lee
The performance of a speech recognition system may be degraded even without any background noise because of the linear or non-linear distortions incurred by recording devices or reverberations. One of the well-known approaches to reduce this channel distortion is feature mapping which maps the distorted speech feature to its clean counterpart. The feature mapping rule is usually trained based on a set of stereo data which consists of the simultaneous recordings obtained in both the reference and target conditions. In this paper, we propose a novel approach to speech feature sequence mapping based on the switching linear dynamic transducer (SLDT). The proposed algorithm enables us a sequence-to-sequence mapping in a systematic way, instead of the traditional vector-to-vector mapping. The proposed approach is applied to compensate channel distortion in speech recognition and shows improvement in recognition performance.
conference of the international speech communication association | 2016
Kang Hyun Lee; Tae Gyoon Kang; Woo Hyun Kang; Nam Soo Kim
Ever since the deep neural network (DNN) appeared in the speech signal processing society, the recognition performance of automatic speech recognition (ASR) has been greatly improved. Due to this achievement, the demands on various applications in distant-talking environment also have been increased. However, ASR performance in such environments is still far from that in close-talking environments due to various problems. In this paper, we propose a novel multichannel-based feature mapping technique combining conventional beamformer, DNN and its joint training scheme. Through the experiments using multichannel wall street journal audio visual (MC-WSJAV) corpus, it has been shown that the proposed technique models the complicated relationship between the array inputs and clean speech features effectively via employing intermediate target. The proposed method outperformed the conventional DNN system.
international conference on acoustics, speech, and signal processing | 2017
Kang Hyun Lee; Woo Hyun Kang; Tae Gyoon Kang; Nam Soo Kim
Since the introduction of deep neural network (DNN)-based acoustic model, robust automatic speech recognition using DNN are being in research. Especially in model adaptation, the techniques utilizing auxiliary context features is known to be a promising technique. Recently, we proposed a technique which is called two-stage noise-aware training (TSNAT). The key idea of TS-NAT is to let the DNN clarify the relationship among noise estimate, noisy features and phonetic target through clean feature representation. However, although TS-NAT enhances the robustness of the DNN, we cannot be certain whether TS-NAT describes the clean feature representation sufficiently. In this paper, we extend TS-NAT using true noise feature and various DNN training techniques. It has been shown that the proposed technique outperforms the conventional DNN-based techniques on Aurora5-task and mismatched noise conditions.
Digital Signal Processing | 2018
Tae Gyoon Kang; Jong Won Shin; Nam Soo Kim
Abstract Recently, deep neural networks (DNNs) were successfully introduced to the speech enhancement area. Conventional DNN-based algorithms generally produce over-smoothed output features which deteriorate the quality of the enhanced speech. In addition, their performance measures calculated in the linear frequency scale do not match the human auditory perception where the sensitivity follows the Mel-frequency scale. In this paper, we propose a novel objective function for DNN-based speech enhancement algorithm. In the proposed technique, a new objective function which consists of the Mel-scale weighted mean square error, and temporal and spectral variations similarities between the enhanced and clean speech is employed in the DNN training stage. The proposed objective function helps to compute the gradients based on a perceptually motivated non-linear frequency scale and alleviates the over-smoothness of the estimated speech. In the experiments, the performance of the proposed algorithm was compared to the conventional DNN-based speech enhancement algorithm in matched and mismatched noise conditions. From the experimental results, we can see that the proposed algorithm performs better than the conventional algorithm in terms of both the objective and subjective measures.
asia pacific signal and information processing association annual summit and conference | 2016
Tae Gyoon Kang; Kang Hyun Lee; Woo Hyun Kang; Soo Hyun Bae; Nam Soo Kim
Recently, the deep neural networks (DNNs) are successfully adopted into the voice activity detection (VAD) area. However, the performance of the DNN-based VAD is still unsatisfactory in noise environments where the feature subspace of the training database and the test environments are not matched with each other. In this paper, we propose a local feature shift technique which normalizes the feature subspaces over various noise environments. The proposed technique considers the local minimum vectors of the log-Mel filterbank features as noise power estimates and produces feature shift vectors from them. The experimental results in stationary and non-stationary noise environments show that the DNN with the proposed technique outperforms the conventional DNN-based VAD algorithms.
international conference on acoustics, speech, and signal processing | 2014
Shin Jae Kang; Tae Gyoon Kang; Kang Hyun Lee; Kiho Cho; Nam Soo Kim
We propose a novel approach to feature enhancement in multi-channel scenario. Our approach is based on the interacting multiple model (IMM), which was originally developed in single-channel scenario. We extend the single-channel IMM algorithm such that it can handle the multichannel inputs under the Bayesian framework. The multichannel IMM algorithm is capable of tracking time-varying room impulse responses and background noises by updating the relevant parameters in an on-line manner. In various environmental conditions, the performance gain of the proposed method has been confirmed.
international conference on acoustics, speech, and signal processing | 2012
Chang Woo Han; Tae Gyoon Kang; Shin Jae Kang; June Sig Sung; Nam Soo Kim
Feature mapping technique is widely used to eliminate the mismatch between the training and test conditions of speech recognition. In the feature mapping, a target (mismatched) feature vector sequence is mapped closer to the corresponding reference (matched) feature vector stream. The training of the mapping system is usually carried out based on a set of stereo data which consists of simultaneous recordings obtained in both the reference and target conditions. In this paper, we propose a novel approach to blind parameter estimation which does not require the reference feature vectors. The proposed approach is motivated by the hidden Markov model (HMM)-based speech synthesis algorithm.
conference of the international speech communication association | 2014
Tae Gyoon Kang; Kisoo Kwon; Jong Won Shin; Nam Soo Kim