Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hong-Goo Kang is active.

Publication


Featured researches published by Hong-Goo Kang.


IEEE Transactions on Multimedia | 2018

SVD-Based Adaptive QIM Watermarking on Stereo Audio Signals

Min Jae Hwang; Jeesok Lee; Mi-Suk Lee; Hong-Goo Kang

This paper proposes a blind digital audio water- marking algorithm that utilizes the quantization index modulation (QIM) and the singular value decomposition (SVD) of stereo audio signals. Conventional SVD-based blind audio watermarking algorithms lack physical interpretation since the matrix construction method for the input matrix for SVD is heuristically defined. However, in the proposed approach, because the SVD is directly applied to the stereo input signals, the resulting decomposed elements convey a conceptually meaningful inter- pretation of the original audio signal. As the proposed approach effectively utilizes the ratio of singular values, the embedded watermark is highly imperceptible and robust against volumetric scaling attacks; most QIM-based watermarking schemes are weak to these types of attacks. Experimental results under well-known practical attacks, such as compressions, resampling, and various types of signal processing, confirm that the proposed algorithm performs well compared to conventional audio watermarking algorithms.


international conference on acoustics, speech, and signal processing | 2015

Improved time-frequency trajectory excitation modeling for a statistical parametric speech synthesis system

Eunwoo Song; Young Sun Joo; Hong-Goo Kang

This paper proposes an improved time-frequency trajectory excitation (TFTE) modeling method for a statistical parametric speech synthesis system. The proposed approach overcomes the dimensional variation problem of the training process caused by the inherent nature of the pitch-dependent analysis paradigm. By reducing the redundancies of the parameters using predicted average block coefficients (PABC), the proposed algorithm efficiently models excitation, even if its dimension is varied. Objective and subjective test results verify that the proposed algorithm provides not only robustness to the training process but also naturalness to the synthesized speech.


IEEE Transactions on Audio, Speech, and Language Processing | 2014

Online Speech Dereverberation Algorithm Based on Adaptive Multichannel Linear Prediction

Jae Mo Yang; Hong-Goo Kang

This paper proposes a real-time acoustic channel equalization method that uses an adaptive multichannel linear prediction technique. In general, multichannel equalization algorithms can eliminate reverberation if they meet the following specific conditions including: the co-primeness between channels and sufficient filter length. It also requires the characteristic of correct channel information, however, it is difficult to estimate accurate acoustic channels in a practical system. The proposed method utilizes a theoretically perfect channel equalization algorithm and considers problems that may arise in the actual system. Linear-predictive multi-input equalization (LIME) is also an appropriate attempt at blind dereverberation by assuring the theoretical basis. However, a huge computational cost is incurred by calculating the large dimensions of a covariance matrix and its inversion. The proposed equalizer is developed as a multichannel linear prediction (MLP) oriented structure with a new formula that is optimized to time-varying acoustical room environments. Moreover, experimental results show that the proposed method works well even if the channel characteristics of each microphone are similar. The results of experiments using various room impulse response (RIR) models, including both the synthesized and real room environments, show that the proposed method is superior to conventional methods.


IEEE Transactions on Audio, Speech, and Language Processing | 2017

Effective Spectral and Excitation Modeling Techniques for LSTM-RNN-Based Speech Synthesis Systems

Eunwoo Song; Frank K. Soong; Hong-Goo Kang

In this paper, we report research results on modeling the parameters of an improved time-frequency trajectory excitation (ITFTE) and spectral envelopes of an LPC vocoder with a long short-term memory (LSTM)-based recurrent neural network (RNN) for high-quality text-to-speech (TTS) systems. The ITFTE vocoder has been shown to significantly improve the perceptual quality of statistical parameter-based TTS systems in our prior works. However, a simple feed-forward deep neural network (DNN) with a finite window length is inadequate to capture the time evolution of the ITFTE parameters. We propose to use the LSTM to exploit the time-varying nature of both trajectories of the excitation and filter parameters, where the LSTM is implemented to use the linguistic text input and to predict both ITFTE and LPC parameters holistically. In the case of LPC parameters, we further enhance the generated spectrum by applying LP bandwidth expansion and line spectral frequency-sharpening filters. These filters are not only beneficial for reducing unstable synthesis filter conditions but also advantageous toward minimizing the muffling problem in the generated spectrum. Experimental results have shown that the proposed LSTM-RNN system with the ITFTE vocoder significantly outperforms both similarly configured band aperiodicity-based systems and our best prior DNN-trainecounterpart, both objectively and subjectively.


international conference on acoustics, speech, and signal processing | 2014

Mean normalization of power function based cepstral coefficients for robust speech recognition in noisy environment

Soonho Baek; Hong-Goo Kang

This paper presents the effect of mean normalization to various types of cepstral coefficients for robust speech recognition in noisy environments. Although the cepstral mean normalization (CMN) technique was originally designed to compensate channel distortion, it has also been proved that the CMN also improves recognition accuracy in additive noisy environment. However, no one has yet considered the interaction of CMN with spectral mapping functions required for extracting cepstral features. This paper investigates the impact of CMN to the speech recognition system depending on the types of spectral mapping function by mathematically analyzing the amount of spectral distortion between clean and noisy conditions. The analytic result is also confirmed by comparing the type of recognition error patterns in automatic speech recognition experiment with Aurora 2 database. Experimental results show that the performance improvement by adopting CMN becomes significant if the logarithmic function is replaced with the appropriate setting of fractional power mapping function. Especially, the deletion errors are dramatically reduced.


IEEE Signal Processing Letters | 2014

An Efficient Multichannel Linear Prediction-Based Blind Equalization Algorithm in Near Common Zeros Condition

Jae-Mo Yang; Hong-Goo Kang

This letter proposes an efficient multichannel acoustic channel equalization method under insufficient channel diversity conditions. To overcome an ill-posed problem caused by near common zeros (NCZs) conditions between different channels, a regularization method that restricts the filter norm has been investigated. However, direct application of this method to the linear-predictive multi-input equalization (LIME) method is not effective. To address this situation, this letter puts forth a novel method to disregard the erroneous term of the LIME solution matrix and to increase forced channel diversity (FCD). The accuracy of the proposed equalization filter is compared to that of the conventional regularization method. Experimental results confirm that the NCZs problem can be solved by adopting the proposed methods.


international conference of the ieee engineering in medicine and biology society | 2017

Continuous bladder volume monitoring system for wearable applications

Seung Chul Shin; Junhyung Moon; Saewon Kye; Kyoungwoo Lee; Yong Seung Lee; Hong-Goo Kang

In this research, we propose a bladder volume monitoring system that can be effectively applied for various voiding dysfunctions. Whereas conventional systems lack consecutive measurements, the proposed system can continuously monitor a users status even during unconscious sleep. For the convenience, we design a simple and comfortable waist-belt-type device by using the body impedance analysis (BIA) technique. To support various measurement scenarios, we develop applications by connecting the device to a smartphone. To minimize motion noises, which are inevitable when monitoring over an extended period, we propose a motion artifact reduction algorithm that exploits multiple frequency sources. The experimental results show a strong relationship between the impedance variation and the bladder volume; this confirms the feasibility of our system.In this research, we propose a bladder volume monitoring system that can be effectively applied for various voiding dysfunctions. Whereas conventional systems lack consecutive measurements, the proposed system can continuously monitor a users status even during unconscious sleep. For the convenience, we design a simple and comfortable waist-belt-type device by using the body impedance analysis (BIA) technique. To support various measurement scenarios, we develop applications by connecting the device to a smartphone. To minimize motion noises, which are inevitable when monitoring over an extended period, we propose a motion artifact reduction algorithm that exploits multiple frequency sources. The experimental results show a strong relationship between the impedance variation and the bladder volume; this confirms the feasibility of our system.


european signal processing conference | 2016

Multi-class learning algorithm for deep neural network-based statistical parametric speech synthesis

Eunwoo Song; Hong-Goo Kang

This paper proposes a multi-class learning (MCL) algorithm for a deep neural network (DNN)-based statistical parametric speech synthesis (SPSS) system. Although the DNN-based SPSS system improves the modeling accuracy of statistical parameters, its synthesized speech is often muffled because the training process only considers the global characteristics of the entire set of training data, but does not explicitly consider any local variations. We introduce a DNN-based context clustering algorithm that implicitly divides the training data into several classes, and train them via a shared hidden layer-based MCL algorithm. Since the proposed MCL method efficiently models both the universal and class-dependent characteristics of various phonetic information, it not only avoids the model over-fitting problem but also reduces the over-smoothing effect. Objective and subjective test results also verify that the proposed algorithm performs much better than the conventional method.


conference of the international speech communication association | 2016

Improved time-frequency trajectory excitation vocoder for DNN-based speech synthesis

Eunwoo Song; Frank K. Soong; Hong-Goo Kang

We investigate an improved time-frequency trajectory excitation (ITFTE) vocoder for deep neural network (DNN)-based statistical parametric speech synthesis (SPSS) systems. The ITFTE is a linear predictive coding-based vocoder, where a pitch-dependent excitation signal is represented by a periodicity distribution in a time-frequency domain. The proposed method significantly improves the parameterization efficiency of ITFTE vocoder for the DNN-based SPSS system, even if its dimension changes due to the inherent nature of pitch variation. By utilizing an orthogonality property of discrete cosine transform, we not only accurately reconstruct the ITFTE parameters but also improve the perceptual quality of synthesized speech. Objective and subjective test results confirm that the proposed method provides superior synthesized speech compared to the previous system.


ieee automatic speech recognition and understanding workshop | 2013

Vector Taylor series based HMM adaptation for generalized cepstrum in noisy environment

Soonho Baek; Hong-Goo Kang

This paper proposes a novel HMM adaptation algorithm for robust automatic speech recognition (ASR) system in noisy environments. The HMM adaptation using vector Taylor series (VTS) significantly improves the ASR performance in noisy environments. Recently, the power normalized cepstral coefficient (PNCC) that replaces a logarithmic mapping function with a power mapping function has been proposed and it is proved that the replacement of the mapping function is robust to additive noise. In this paper, we extend the VTS based approach to the cepstral coefficients obtained by using a power mapping function instead of a logarithmic mapping function. Experimental results indicate that HMM adaptation in the cepstrum obtained by using a power mapping function improves the ASR performance comparing the VTS based conventional approach for mel-frequency cepstral coefficients (MFCCs).

Collaboration


Dive into the Hong-Goo Kang's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge