Milan Jelinek
Université de Sherbrooke
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Milan Jelinek.
IEEE Transactions on Audio, Speech, and Language Processing | 2007
Milan Jelinek; Redwan Salami
This paper presents novel techniques for source-controlled variable-rate wideband speech coding. These techniques have been used in the variable-rate multimode wideband (VMR-WB) speech codec recently selected by the Third-Generation Partnership Project 2 (3GPP2) for wideband (WB) speech telephony, streaming, and multimedia messaging services in the cdma2000 third-generation wireless system. The codec utilizes efficient coding modes optimized for different classes of speech signal including generic coding based on AMR-WB for transients and onsets, voiced coding optimized for stable voiced signals, unvoiced coding optimized for unvoiced segments, and comfort noise generation for inactive segments. Several innovations enable very good performance at average bit rates below 8 kb/s for active speech coding. The article presents an overview of the codec and describes in detail some of the codec novel features: Robust pitch tracking algorithm, coding-mode dependent prediction of linear prediction (LP) filter quantization, and novel frame erasure concealment techniques including supplementary information for reconstruction of lost onsets and improving decoder convergence. Selected results from the Selection and Characterization tests of the codec illustrate its performance
IEEE Communications Magazine | 2006
Sassan Ahmadi; Milan Jelinek
This article is an overview of the architecture and operation of the VMR-WB5 a source- and network-controlled variable-rate multimode codec designed for robust processing of wideband speech. To enable a smooth transition from legacy narrowband voice services, VMR-WB is also capable of processing conventional telephone-bandwidth speech. The VMR-WB codec is interoperable with AMR-WB at certain bit rates, thus eliminating quality degradation and additional delay due to transcoding
international conference on acoustics, speech, and signal processing | 2004
Milan Jelinek; Redwan Salami; S. Ahmadi; B. Bessetle; Philippe Gournay; Claude Laflamme
The description and design of the source-controlled variable-rate multimode wideband (VMR-WB) codec recently selected by the 3/sup rd/ Generation Partnership Project 2 (3GPP2) for the cdma2000/spl reg/ system in Rate-Set II are presented. The paper gives an overview of the codec and the methodologies that enable high quality wideband coding at average data rates ranging from TIA/EIA/IS-733 ADR (average data rate) to that of TIA/EIA/IS-127. The codec has three modes of operation at different average data rates and a fourth mode that is interoperable with 3GPP/AMR-WB (ITU-T G.722.2). Despite the interoperability constraint, the codec is capable of meeting the aggressive performance requirements through the use of novel techniques such as noise suppression, efficient signal classification, new coding types optimized for stable voiced and unvoiced frames, novel post-processing technique for periodicity enhancement in the lower frequency band, and improved frame erasure concealment mechanisms.
IEEE Communications Magazine | 2009
Milan Jelinek; Tommy Vaillancourt; Jon Gibbs
This article is an overview of the standardization, architecture, and performance of the new ITU-T Recommendation G.718. G.718 is an embedded variable bit rate codec providing a scalable solution for compression of 8 and 16 kHz sampled speech and audio signals at rates between 8 kb/s and 32 kb/s. It comprises five layers where higher-layer bitstreams can be discarded without affecting the lower layersiquest decoding. The codec also has an optional core layer interoperable with ITU-T G.722.2 (3GPP AMR-WB) at 12.65 kb/s. G.718 was designed to provide high speech quality at low bit rates and to be robust to significant rates of frame erasures or packet losses. It is also targeting good quality for generic audio at higher rates.
international conference on acoustics speech and signal processing | 1999
Milan Jelinek; Jean-Pierre Adoul
Estimation of the spectral envelope in the frequency domain allows to avoid some problems of linear prediction (LP) algorithms for voiced speech. We present a low complexity method of spectral envelope estimation from harmonics for low rate coding. The method consists in computing the harmonic amplitude spectrum using pitch-synchronous DFT with length depending on voicing, modifying this spectrum outside the telephone bandwidth to simplify modeling of the useful bandwidth and interpolating it by a frequency-domain low-pass filter. An all-pole model is then fitted to this modified smoothed version of the harmonic spectrum. The method was implemented on the harmonic-stochastic excitation (HSX) vocoder and the performance was compared with the LP algorithm similar to that used in the G.729 speech coding standard. A-B comparative tests show an important increase in perceptual quality.
IEEE Transactions on Audio, Speech, and Language Processing | 2010
Vaclav Eksler; Milan Jelinek
This paper presents a new technique for the class of code-excited linear prediction speech codecs designed to reduce error propagation after lost frames. Its principle consists in replacing the interframe long-term prediction with a glottal-shape codebook in the subframe containing the first glottal impulse in a given frame. This technique, independent of previous frames, is of particular interest in voiced speech frames following transitions as these frames are the most sensitive to frame erasures. It is a basis of a structured coding scheme called transition coding (TC). The TC greatly improves codec performance in noisy channels while maintaining clean channel performance. It is a part of the new embedded speech and audio codec recently standardized as Recommendation G.718 by ITU-T.
international conference on acoustics, speech, and signal processing | 2008
Vaclav Eksler; Milan Jelinek
CELP-based codecs typically rely on prediction to achieve their high coding efficiency. On the other hand, the prediction makes these codecs sensitive to frame erasures as errors propagate beyond the erased frame. We present a technique that significantly limits the error propagation by replacing inter-frame long-term prediction with a non-predictive glottal-shape codebook. The technique was implemented in the winning candidate of the EV-VBR baseline codec selection by ITU-T in March 2007. To maintain the performance in clean channel, this transition mode coding technique was used only in frames following voiced onsets frames, i.e. the frames most sensitive to frame errors.
IEEE Transactions on Speech and Audio Processing | 2005
Mikko Tammi; Milan Jelinek; Vesa Ruoppila
This paper introduces a novel signal modification method for wide-band code-excited linear prediction (CELP) speech codecs to improve pitch prediction at low bit rates. The method is enabled only in stable voiced speech frames, and preserves the original time scale at the end of each frame. This feature helps to avoid artifacts and simplifies an encoder implementation. The signal modification includes a classification algorithm as an integral part. The classification algorithm detects the frames most suitable for signal modification and low bit rate coding, and can be employed in a rate selection module of variable bit rate (VBR) codecs. In this paper, the signal modification method is applied in an experimental VBR wide-band speech codec derived from the 3GPP adaptive multirate wideband (AMR-WB) standard (ITU-T Recommendation G.722.2). The codec fulfills the system requirements of IS-95/CDMA2000 Rate Set II, operating at source coding bit rates 12.65, 6.2, and 1.0 kb/s. The signal modification is used in the 6.2 kb/s mode dedicated for voiced speech frames. Listening test results demonstrate the good performance of the proposed method. The signal modification method is used in the Nokia/VoiceAge codec that was declared in April 2003 as the winner of the selection phase in the 3GPP2 CDMA2000 wide-band speech codec standardization.
international conference on acoustics, speech, and signal processing | 2015
Vaclav Eksler; Milan Jelinek; Redwan Salami
The recently standardized codec for Enhanced Voice Services (EVS) consists of a number of modes to achieve its high coding flexibility. In this paper we focus on techniques that enable a seamless switching between two linear prediction based modes running at different sampling rates within this codec. The first one deals with an efficient conversion of the linear prediction filter coefficients. The other one is based on a constrained-memory ACELP called transition coding (TC) that significantly limits the inter-frame long-term dependency. We show that the use of TC can be successfully extended to improve quality also in coding other transitions, e.g. strong onsets of voiced speech.
ieee global conference on signal and information processing | 2015
Vladimir Malenovsky; Milan Jelinek
The recent standard on Enhanced Voiced Services (EVS) contains two memory-less gain coding mechanisms achieving better performance than the prediction-based techniques used in 3GPP AMR-WB and ITU-T G.729 codecs. The EVS gain encoder uses joint vector quantization without the need of information from previous frames. Inter-frame prediction is replaced by alternative schemes based on sub-frame prediction or estimated average target signal energy. This eliminates the propagation of error inside the adaptive codebook and reduces the risk of artifacts in the recovery stage after frame error concealment. The results show that the EVS codec outperforms AMR-WB at all bitrates while keeping the same amount of bits required for gain quantization.