Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Balazs Kovesi is active.

Publication


Featured researches published by Balazs Kovesi.


international conference on acoustics, speech, and signal processing | 2007

ITU-T G.729.1: AN 8-32 Kbit/S Scalable Coder Interoperable with G.729 for Wideband Telephony and Voice Over IP

Stéphane Ragot; Balazs Kovesi; Romain Trilling; David Virette; Nicolas Duc; Dominique Massaloux; Stéphane Proust; Bernd Geiser; Martin Gartner; Stefan Schandl; Hervé Taddei; Yang Gao; Eyal Shlomot; Hiroyuki Ehara; Koji Yoshida; Tommy Vaillancourt; Redwan Salami; Mi Suk Lee; Do Young Kim

This paper describes the scalable coder - G.729.1 - which has been recently standardized by ITU-T for wideband telephony and voice over IP (VoIP) applications. G.729.1 can operate at 12 different bit rates from 32 down to 8 kbit/s with wideband quality starting at 14 kbit/s. This coder is a bitstream interoperable extension of ITU-T G.729 based on three embedded stages: narrowband cascaded CELP coding at 8 and 12 kbit/s, time-domain bandwidth extension (TDBWE) at 14 kbit/s, and split-band MDCT coding with spherical vector quantization (VQ) and pre-echo reduction from 16 to 32 kbit/s. Side information - consisting of signal class, phase, and energy - is transmitted at 12, 14 and 16 kbit/s to improve the resilience and recovery of the decoder in case of frame erasures. The quality, delay, and complexity of G.729.1 are summarized based on ITU-T results.


international conference on acoustics, speech, and signal processing | 2004

A scalable speech and audio coding scheme with continuous bitrate flexibility

Balazs Kovesi; Dominique Massaloux; Aurelien Sollaud

Networks are getting more and more heterogeneous. Scalable codecs are especially suited for such a context as they permit the bitrate to be lowered in a simple way, at any point of the transmission, for adaptation to network conditions and to terminal capacities. Classically, scalable codecs are organised in layers and scalability is obtained by sending more or fewer layers to the decoder. The obtained granularity depends on the layer sizes, and the available bitrates are fixed and limited in number. The paper presents a novel scalable audio coding scheme where the bitrates vary continuously between a minimal and a maximal value, allowing free modification of the bitrate. With this novel approach, all bitrates are valid, sending even one more bit results in different output signal with statistically growing quality. Test results show that this method provides quality as good as or even better than that of a non-scalable version.


international conference on acoustics, speech, and signal processing | 2006

A 8–32 KBIT/S Scalable Wideband Speech and Audio Coding Candidate for ITU-T G729EV Standardization

Stéphane Ragot; Balazs Kovesi; David Virette; Romain Trilling; Dominique Massaloux

This paper describes a 8-32 kbit/s scalable speech and audio coder submitted as a candidate for the ITU-T G729-based embedded variable bitrate (G729EV) standardization. The coder is built upon a 3-stage coding structure consisting of: narrowband cascade CELP coding at 8 and 12 kbit/s, bandwidth extension based on wideband linear-predictive coding (WB-LPC) at 14 kbit/s, and MDCT coding in a WB-LPC weighted signal domain from 14 to 32 kbit/s. ITU-T test results showed that this coder passed all the requirements of the G729EV qualification phase


international conference on acoustics, speech, and signal processing | 2008

Adaptive time-frequency resolution in modulated transform at reduced delay

David Virette; Balazs Kovesi; Pierrick Philippe

Cosine-modulated transforms such as the Modified Discrete Cosine Transform (MDCT) are key elements in audio coding. They allow efficient energy compaction and perceptual irrelevancy reduction. The frequency localization can be adapted to the signal characteristics and fast implementations exist. All of this has made MDCT the most popular transform in audio coding. In this paper we solve a problem never addressed in the literature: in order to enable variable MDCT sizes in communication codecs, we demonstrate how frequency resolution can be adapted on the fly without using transition windows hence decreasing coding delay. A low complexity implementation of the method is also proposed.


international conference on acoustics, speech, and signal processing | 2008

A low complexity packet loss concealment algorithm for ITU-T G.722

Balazs Kovesi; Stéphane Ragot

This article presents ITU-T G.722 appendix IV which is a packet loss concealment (PLC) algorithm recently standardized by ITU-T for G.722 decoding in the presence of frame erasures. This algorithm is suitable for applications that may encounter frame erasures or packet losses with a special focus on complexity constraints For example, G.722 appendix IV is very suitable for DECT next generation and VoIP using low cost devices. Besides, we also describe some minor algorithmic modifications to G.722 appendix IV that improve subjective quality. We discuss G.722- PLC performance based on formal ITU-T test results as well as additional informal experiments.


multimedia signal processing | 2010

Parametric stereo extension of ITU-T G.722 based on a new downmixing scheme

Thi Minh Nguyet Hoang; Stéphane Ragot; Balazs Kovesi; Pascal Scalart

In this paper, we present a novel, frequency-domain stereo to mono downmixing, which preserves the energy of spectral components and avoids setting the left or right channel as a phase reference. Based on this downmixing technique, a parametric stereo analysis-synthesis model is described in which subband stereo parameters consist of interchannel level differences and phase differences between the mono signal and one of the stereo channels (left or right). This model is applied to the stereo extension of ITU-T G.722 at 56+8 and 64+16 kbit/s with a frame length of 5 ms. AB test results are provided to assess the quality of the proposed downmixing technique. In addition, the quality of the proposed G.722-based stereo coder is compared against reference coders (G.722.1 at 24 and 32 kbit/s dual mono and G.722 at 64 kbit/s dual mono) for clean speech, noisy speech and music.


international conference on acoustics, speech, and signal processing | 2008

An 64-80-96 kbit/s scalable wideband speech coding candidate for ITU-T G.711-WB standardization

Balazs Kovesi; Stéphane Ragot; A. Le Guyader

This article presents a bitrate- and bandwidth-scalable coder submitted as a candidate to the qualification phase of ITU-T embedded G.711 wideband (G.711-WB) standardization. The encoder operates on 5 ms frames and bitrate can be set on a frame basis at either 64, 80 or 96 kbit/s. The input signal, which is by default sampled at 16 kHz, is split in two bands. The low band (0-4000 Hz) is coded by a low-complexity embedded pulse code modulation (PCM) coder at 8-10 bits/sample, including noise feedback at the encoder and PCM-specific post-filtering at the decoder. The high-band (4000- 8000 Hz) is coded by a time-domain aliasing cancellation (TDAC) coder derived from ITU-T G.729.1. Frame erasures are concealed using a low-complexity split-band algorithm derived from ITU-T G.722 Appendix IV. The proposed coder passed all requirements of the G.711WB qualification phase. We summarize and discuss formal ITU-T test results.


international conference on acoustics, speech, and signal processing | 2011

Re-engineering ITU-T G.722: Low delay and complexity superwideband coding at 64 kbit/s with G.722 bitstream watermarking

Balazs Kovesi; Stéphane Ragot; Claude Lamblin; Lei Miao; Zexin Liu; Chen Hu

This paper presents the lowest bitrate mode (64 kbit/s) of the new superwideband (SWB, 50–14000 Hz) coder, recently standardized as ITU-T G.722 Annex B. This mode provides a superwideband extension of G.722 at 56 kbit/s with one 8 kbit/s enhancement layer divided in two sub-layers. The resulting bitstream is compatible with ITU-T G.722 at 64 kbit/s and can be viewed as watermarking G.722 least significant bits (LSBs). The novel technologies in this mode include G.722 enhancements (noise feedback coding, scalable quantization in G.722 higher band), as well as multimode bandwidth extension (BWE). Selected ITU-T characterization test results and additional informal test results show that the 64 kbit/s mode of G.722 Annex B gives high SWB quality with low delay and complexity.


international conference on acoustics, speech, and signal processing | 2013

G.722 annex D and G.711.1 Annex F - New ITU-T stereo codecs

David Virette; Yue Lang; Lei Miao; Wenhai Wu; Balazs Kovesi; Claude Lamblin; Stéphane Ragot

This paper presents the two new ITU-T Recommendations G.722 Annex D and G.711.1 Annex F, which are stereo extensions of the wideband codecs ITU-T G.722 and G.711.1 and their superwideband extensions (G.722 Annex B and G.711.1 Annex D). An embedded scalable structure is used to add stereo extension layers on top of the wideband or superwideband core coding. Wideband stereo modes are supported at the bit rates of 64/80 and 96/128 kbit/s for G.722 and G.711.1 (respectively), while superwideband stereo modes are supported at 80/96/112/128 and 112/128/144/160 kbit/s. The parametric stereo coding model is based on a frequency domain downmix, wideband inter-channel differences estimation, quantization and synthesis, low complexity coherence analysis and synthesis, stereo transient detection and stereo post-processing. An overview of formal ITU-T characterization listening tests illustrates the performance of these codecs.


Archive | 2001

Transmission error concealment in an audio signal

Balazs Kovesi; Dominique Massaloux; David Deleam

Collaboration


Dive into the Balazs Kovesi's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mi Suk Lee

Electronics and Telecommunications Research Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge