Sung-Kyo Jung | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sung-Kyo Jung is active.

Explore More

Publication

Featured researches published by Sung-Kyo Jung.

international conference on acoustics, speech, and signal processing | 2004

A bit-rate/bandwidth scalable speech coder based on ITU-T G.723.1 standard

Sung-Kyo Jung; Kyung-Tae Kini; Hong-Goo Kang

The paper presents a new scalable coder based on the ITU-T G.723.1 standard which is one of the most famous speech coders for VoIP applications. In order to support both bit-rate scalability and bandwidth scalability, the proposed coder adopts a split-band approach, where the input signal, sampled at 16 kHz, is decomposed into two equal frequency bands. The lower-band speech is coded with a standard coder, such as the G.723.1 standard. In addition, the low-band enhancement layer for lower-band speech improves the perceptual quality of decoded speech by employing additional coding units based on a cascaded codebook approach. The higher-band signal is encoded using an MDCT-based transform coding scheme. The proposed coder, at a bit-rate of 19.4 kbit/s, provides speech quality comparable to the ITU-T 24 kbit/s G.722.1 coder, while it also has interoperability with G.723.1.

international conference on acoustics, speech, and signal processing | 2003

A packet loss concealment algorithm based on time-scale modification for CELP-type speech coders

Moon-Keun Lee; Sung-Kyo Jung; Hong-Goo Kang; Young-Cheol Park; Dae Hee Youn

We propose a packet loss concealment algorithm for a code-excited linear prediction (CELP) speech coder. We perform a time-scale modification (TSM) using a waveform similarity overlap-add (WSOLA) technique to reconstruct the excitation signal of the lost or dropped frames. In addition, when a lost frame is classified as a voiced, an adaptive codebook gain and a fixed codebook gain are estimated by a modified gain parameter re-estimation (GRE) technique. By applying these techniques, we can reduce quality degradation of the decoded speech and error propagation effect through the adaptive codebook memory. We apply the proposed scheme to the ITU-T G.729 standard speech coder to evaluate the performance of the proposed method. The perceptual evaluation of speech quality (PESQ) and AB preference tests under various packet loss conditions verify that the proposed algorithm is superior to the concealment algorithm embedded in the G.729.

IEEE Signal Processing Letters | 2005

A fast adaptive-codebook search algorithm for G.723.1 speech coder

Sung-Kyo Jung; Kyung-Tae Kim; Young-Cheol Park; Hong-Goo Kang

This letter presents a new fast search algorithm for the multitap adaptive codebook used in the G.723.1 standard speech coder. In contrast with the standard method that a closed-loop pitch lag and gains for a fifth-order pitch predictor are searched simultaneously, the proposed algorithm adopts a sequential and restricted approach to determine the parameters. In other words, the proposed scheme first determines a couple of pitch lag candidates using a first-order pitch predictor and then computes the pitch gains of the fifth-order predictor within a restricted search area. Experimental results confirm that the proposed algorithm reduces the total complexity by 30.69% in the encoding process and provides speech quality equivalent to the standard method.

IEEE Transactions on Audio, Speech, and Language Processing | 2007

Applying a Speaker-Dependent Speech Compression Technique to Concatenative TTS Synthesizers

Chang-Heon Lee; Sung-Kyo Jung; Hong-Goo Kang

This paper proposes a new speaker-dependent coding algorithm to efficiently compress a large speech database for corpus-based concatenative text-to-speech (TTS) engines while maintaining high fidelity. To achieve a high compression ratio and meet the fundamental requirements of concatenative TTS synthesizers, such as partial segment decoding and random access capability, we adopt a nonpredictive analysis-by-synthesis scheme for speaker-dependent parameter estimation and quantization. The spectral coefficients are quantized by using a memoryless split vector quantization (VQ) approach that does not use frame correlation. Considering that excitation signals of a specific speaker show low intra-variation especially in the voiced regions, the conventional adaptive codebook for pitch prediction is replaced by a speaker-dependent pitch-pulse codebook trained by a corpus of single-speaker speech signals. To further improve the coding efficiency, the proposed coder flexibly combines nonpredictive and predictive type method considering the structure of the TTS system. By applying the proposed algorithm to a Korean TTS system, we could obtain comparable quality to the G.729 speech coder and satisfy all the requirements that TTS system needs. The results are verified by both objective and subjective quality measurements. In addition, the decoding complexity of the proposed coder is around 55% lower than that of G.729 annex A

international conference on acoustics, speech, and signal processing | 2003

A cascaded algebraic codebook structure to improve the performance of speech coder

Sung-Kyo Jung; Kyoung-Tae Kim; Hong-Goo Kang; Dae Hee Youn

This paper presents a cascade structure of an algebraic codebook to improve the performance of low bit-rate speech coder. A codeword of an algebraic codebook consists of a set of pulse amplitudes and positions. In general, the amplitude of each pulse is constrained to be either +1 or -1 due to the limitations of bit-rate and complexity. Thus, the performance of the codebook is varied depending on the characteristic of input target vectors. In this paper, we extend the algebraic codebook structure to two stages in order to provide flexible pulse combinations. While all pulses, M, are simultaneously selected in a classical one-stage algebraic codebook, the cascade structure searches the pulses with a two step procedure, i.e., L pulses at the first stage and (M-L) pulses at the second stage. Experiments confirm that our algorithm provides higher quality than the conventional scheme when the total number of pulses is same. In case of assigning 24 pulses per 8-ms subframe, a segmental SNR between target and synthesized signal increases 1.04 dB. In addition, at the same environment, the complexity of fixed codebook search is reduced by about 32%.

international conference on acoustics, speech, and signal processing | 2008

An embedded variable bit-rate coder based on GSM EFR: EFR-EV

Sung-Kyo Jung; Stéphane Ragot; Claude Lamblin; Stéphane Proust

This paper describes a 12.2-32 kbps scalable wideband speech and audio coder interoperable with GSM enhanced full-rate (EFR). This coder, referred to as EFR-EV, is designed using the ITU-T G.729.1 multi-stage coding structure. Specifically, EFR-EV consists of three stages: a code-excited linear prediction (CELP) stage derived from EFR, time-domain bandwidth extension (TDBWE), and time-domain aliasing cancellation (TDAC). In this paper, we show that the G.729.1 extension layers (i.e. TDBWE and TDAC) are quite generic for scalable codec design in the sense that they can be applied to EFR with limited adjustments. In addition, we propose a minor modification of the bit allocation procedure in TDAC stage, exploiting spectral masking only for higher frequency bands. The performance of EFR- EV and G.729.1 are evaluated in terms of objective/subjective quality, algorithmic delay, and complexity.

Speaker Classification II | 2007

Bayes-Optimal Estimation of GMM Parameters for Speaker Recognition

Guillermo Garcia; Sung-Kyo Jung; Thomas Eriksson

In text-independent speaker recognition, Gaussian Mixture Models (GMMs) are widely employed as statistical models of the speakers. It is assumed that the Expectation Maximization (EM) algorithm can estimate the optimal model parameters such as weight, mean and variance of each Gaussian model for each speaker. However, this is not entirely true since there are practical limitations, such as limited size of the training database and uncertainties in the model parameters. As is well known in the literature, limited-size databases is one of the largest challenges in speaker recognition research. In this paper, we investigate methods to overcome the database and parameter uncertainty problem. By reformulating the GMM estimation problem in a Bayesian-optimal way (as opposed to ML-optimal, as with the EM algorithm), we are able to change the GMM parameters to better cope with limited database size and other parameter uncertainties. Experimental results show the effectiveness of the proposed approach.

international conference on acoustics, speech, and signal processing | 2007

A Statistical Approach to Performance Evaluation of Speaker Recognition Systems

Guillermo Garcia; Thomas Eriksson; Sung-Kyo Jung

In speaker recognition applications, speaker identification is the process of automatic recognizing who is speaking based on statistical information obtained from speech signals. Considering the limited number of tests in real situations during the classification phase, it is more useful to have an estimator of the probability of error for speaker recognition systems. In this work, we propose a method based on the log-likelihood of each speaker to estimate the probability of error of a speaker recognition system. We assess the performance of the estimator with experimental trials and compare with the actual number of errors. The results show that the performance of our estimator is comparable to the conventional method. The proposed method presents better reliability and fast convergence compared to the counting case. Indeed, we attain an analytical expression for the probability of error that can be used as a gradient for other optimization methods in speaker recognition applications.

Archive | 2005