Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Eunwoo Song is active.

Publication


Featured researches published by Eunwoo Song.


international conference on acoustics, speech, and signal processing | 2015

Improved time-frequency trajectory excitation modeling for a statistical parametric speech synthesis system

Eunwoo Song; Young Sun Joo; Hong-Goo Kang

This paper proposes an improved time-frequency trajectory excitation (TFTE) modeling method for a statistical parametric speech synthesis system. The proposed approach overcomes the dimensional variation problem of the training process caused by the inherent nature of the pitch-dependent analysis paradigm. By reducing the redundancies of the parameters using predicted average block coefficients (PABC), the proposed algorithm efficiently models excitation, even if its dimension is varied. Objective and subjective test results verify that the proposed algorithm provides not only robustness to the training process but also naturalness to the synthesized speech.


IEEE Transactions on Audio, Speech, and Language Processing | 2017

Effective Spectral and Excitation Modeling Techniques for LSTM-RNN-Based Speech Synthesis Systems

Eunwoo Song; Frank K. Soong; Hong-Goo Kang

In this paper, we report research results on modeling the parameters of an improved time-frequency trajectory excitation (ITFTE) and spectral envelopes of an LPC vocoder with a long short-term memory (LSTM)-based recurrent neural network (RNN) for high-quality text-to-speech (TTS) systems. The ITFTE vocoder has been shown to significantly improve the perceptual quality of statistical parameter-based TTS systems in our prior works. However, a simple feed-forward deep neural network (DNN) with a finite window length is inadequate to capture the time evolution of the ITFTE parameters. We propose to use the LSTM to exploit the time-varying nature of both trajectories of the excitation and filter parameters, where the LSTM is implemented to use the linguistic text input and to predict both ITFTE and LPC parameters holistically. In the case of LPC parameters, we further enhance the generated spectrum by applying LP bandwidth expansion and line spectral frequency-sharpening filters. These filters are not only beneficial for reducing unstable synthesis filter conditions but also advantageous toward minimizing the muffling problem in the generated spectrum. Experimental results have shown that the proposed LSTM-RNN system with the ITFTE vocoder significantly outperforms both similarly configured band aperiodicity-based systems and our best prior DNN-trainecounterpart, both objectively and subjectively.


european signal processing conference | 2016

Multi-class learning algorithm for deep neural network-based statistical parametric speech synthesis

Eunwoo Song; Hong-Goo Kang

This paper proposes a multi-class learning (MCL) algorithm for a deep neural network (DNN)-based statistical parametric speech synthesis (SPSS) system. Although the DNN-based SPSS system improves the modeling accuracy of statistical parameters, its synthesized speech is often muffled because the training process only considers the global characteristics of the entire set of training data, but does not explicitly consider any local variations. We introduce a DNN-based context clustering algorithm that implicitly divides the training data into several classes, and train them via a shared hidden layer-based MCL algorithm. Since the proposed MCL method efficiently models both the universal and class-dependent characteristics of various phonetic information, it not only avoids the model over-fitting problem but also reduces the over-smoothing effect. Objective and subjective test results also verify that the proposed algorithm performs much better than the conventional method.


conference of the international speech communication association | 2016

Improved time-frequency trajectory excitation vocoder for DNN-based speech synthesis

Eunwoo Song; Frank K. Soong; Hong-Goo Kang

We investigate an improved time-frequency trajectory excitation (ITFTE) vocoder for deep neural network (DNN)-based statistical parametric speech synthesis (SPSS) systems. The ITFTE is a linear predictive coding-based vocoder, where a pitch-dependent excitation signal is represented by a periodicity distribution in a time-frequency domain. The proposed method significantly improves the parameterization efficiency of ITFTE vocoder for the DNN-based SPSS system, even if its dimension changes due to the inherent nature of pitch variation. By utilizing an orthogonality property of discrete cosine transform, we not only accurately reconstruct the ITFTE parameters but also improve the perceptual quality of synthesized speech. Objective and subjective test results confirm that the proposed method provides superior synthesized speech compared to the previous system.


international conference of the ieee engineering in medicine and biology society | 2015

A constrained two-layer compression technique for ECG waves.

Kyungguen Byun; Eunwoo Song; Hwan Shim; Hyungjoon Lim; Hong-Goo Kang

This paper proposes a constrained two-layer compression technique for electrocardiogram (ECG) waves, of which encoded parameters can be directly used for the diagnosis of arrhythmia. In the first layer, a single ECG beat is represented by one of the registered templates in the codebook. Since the required coding parameter in this layer is only the codebook index of the selected template, its compression ratio (CR) is very high. Note that the distribution of registered templates is also related to the characteristics of ECG waves, thus it can be used as a metric to detect various types of arrhythmias. The residual error between the input and the selected template is encoded by a wavelet-based transform coding in the second layer. The number of wavelet coefficients is constrained by pre-defined maximum distortion to be allowed. The MIT-BIH arrhythmia database is used to evaluate the performance of the proposed algorithm. The proposed algorithm shows around 7.18 CR when the reference value of percentage root mean square difference (PRD) is set to ten.


international conference on digital signal processing | 2014

Fixed-point implementation of MPEG-D unified speech and audio coding decoder

Eunwoo Song; Hong-Goo Kang; Joonil Lee

This paper describes a fixed-point implementation method of the unified speech and audio coding (USAC) decoder that has been recently standardized by moving picture experts group (MPEG). Since the structure of USAC is too complicated to support both speech and audio signals, the quality and complexity issues must be carefully reviewed while performing fixed-point implementation. By analyzing the structure of the USAC decoder, this paper describes key ideas to successfully realize the fixed-point system. Subjective and objective test results verify that the implemented fixed-point decoder shows equivalent quality to the floating-point decoder. The average and worst cases of complexity depending on the type of encoding modes are also given in detail.


asia-pacific signal and information processing association annual summit and conference | 2013

Speech enhancement for pathological voice using time-frequency trajectory excitation modeling

Eunwoo Song; Jong-youb Ryu; Hong-Goo Kang

This paper proposes a speech enhancement algorithm for pathological voices using a time-frequency trajectory excitation (TFTE) modeling. The TFTE model has a capability of delicately controlling the periodic and non-periodic excitation components by taking a single pitch based decomposition process. By investigating the difference of frequency characteristics between pathological and normal voices, this paper proposes an enhancement algorithm which can efficiently reduce the breathiness of the pathological voice while maintaining the identity of the speaker. Subjective test results are presented to verify the effectiveness of the proposed algorithm.


conference of the international speech communication association | 2015

Deep neural network-based statistical parametric speech synthesis system using improved time-frequency trajectory excitation model.

Eunwoo Song; Hong-Goo Kang


international conference on acoustics, speech, and signal processing | 2018

Modeling-By-Generation-Structured Noise Compensation Algorithm for Glottal Vocoding Speech Synthesis System.

Min-Jae Hwang; Eunwoo Song; Kyungguen Byun; Hong-Goo Kang


conference of the international speech communication association | 2018

A Unified Framework for the Generation of Glottal Signals in Deep Learning-based Parametric Speech Synthesis Systems.

Min-Jae Hwang; Eunwoo Song; Jin-Seob Kim; Hong-Goo Kang

Collaboration


Dive into the Eunwoo Song's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Joun Yeop Lee

Seoul National University

View shared research outputs
Top Co-Authors

Avatar

Sung Jun Cheon

Seoul National University

View shared research outputs
Researchain Logo
Decentralizing Knowledge