Jacek Stachurski
Texas Instruments
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jacek Stachurski.
international conference on acoustics, speech, and signal processing | 2008
Milan Jelinek; Tommy Vaillancourt; Ali Erdem Ertan; Jacek Stachurski; Anssi Rämö; Lasse Laaksonen; Jon Gibbs; Stefan Bruhn
We present the Q.EV-VBR winning candidate codec recently selected by Question 9 of Study Group 16 (Q9/16) of ITU-T as a baseline for the development of a scalable solution for wideband speech and audio compression at rates between 8 kb/s and 32 kb/s. The Q9/16 codec is an embedded codec comprising 5 layers where higher layer bitstreams can be discarded without affecting the decoding of the lower layers. The two lower layers are based on the CELP technology where the core layer takes advantage of signal classification based encoding. The higher layers encode the weighted error signal from lower layers using overlap-add transform coding. The codec has been designed with the primary objective of a high-performance wideband speech coding for error- prone telecommunications channels, without compromising the quality for narrowband/wideband speech or wideband music signals. The codec performance is demonstrated with selected test results.
international conference on acoustics speech and signal processing | 1999
Jacek Stachurski; Alan V. McCree; Vishu R. Viswanathan
A number of coding techniques have been reported to achieve near toll quality synthesized speech at bit-rates around 4 kb/s. These include variants of code excited linear prediction (CELP), sinusoidal transform coding (STC) and multi-band excitation (MBE). While CELP has been an effective technique for bit-rates above 6 kb/s, STC, MBE, waveform interpolation (WI) and mixed excitation linear prediction (MELP) models seem to be attractive at bit-rates below 3 kb/s. We present a system to encode speech with high quality using MELP, a technique previously demonstrated to be effective at bit-rates of 1.6-2.4 kb/s. We have enhanced the MELP model producing significantly higher speech quality at bit-rates above 2.4 kb/s. We describe the development and testing of a high quality 4 kb/s MELP coder.
international conference on acoustics, speech, and signal processing | 2008
Anssi Rämö; Henri Toukomaa; S. Craig Greer; Lasse Laaksonen; Jacek Stachurski; A. Erdem Ertan; Jonas Svedberg; Jon Gibbs; Tommy Vaillancourt
ITU-T has selected the candidate submitted by Ericsson, Nokia, Motorola, VoiceAge, and Texas Instruments as the baseline for the G.EV-VBR coding standard. G.EV-VBR is an embedded scalable speech codec that uses state-of-the-art technology to provide the most efficient encoded speech available for various real-time applications. EV-VBR encodes both narrowband (NB) and wideband (WB) speech signals starting at 8 kbps. Near perfect wideband representation is achieved at 32 kbps for all signal types. The bit stream is divided into five robust layers, providing sufficient granularity, in particular for VoIP applications. In addition, an extension to the codec will provide super- wideband and stereo capability by adding layers to the codec. Extensive listening tests were conducted during the ITU-T selection phase to support selection of the best- performing candidate. The selected EV-VBR candidate passed 69 of 70 required and 25 of 28 objective terms of reference.
international conference on acoustics, speech, and signal processing | 2000
Jacek Stachurski; Alan V. McCree
The paper describes a hybrid multi-modal codec with MELP and CELP coders used for different speech regions. Three modes are used: strongly voiced, weakly voiced, and unvoiced. The weakly voiced mode includes transitions and plosives; it is used when neither strong voicing nor unvoiced region are clearly identified. In the strongly voiced mode the MELP coder is used, while in the weakly voiced and unvoiced modes the CELP coder is employed. To limit switching artifacts between the coders, alignment phase is estimated and transmitted in the MELP mode making the original and MELP-synthesized speech time-synchronous. Additionally, in zero-phase equalization, the phase component of the CELP target signal is removed making the target waveform more similar to the MELP-synthesized speech. These two techniques, alignment-phase encoding and zero-phase equalization, greatly reduce switching artifacts in MELP/CELP transition regions. Formal listening test results of the 4 kb/s hybrid coder show that it can achieve speech quality equivalent to 32 kb/s ADPCM.
international conference on acoustics, speech, and signal processing | 2010
Noboru Harada; Yutaka Kamamoto; Takehiro Moriya; Yusuke Hiwasaki; Michael A. Ramalho; Lorin Netsch; Jacek Stachurski; Lei Miao; Herve Marcel Taddei; Fengyan Qi
The ITU-T Recommendation G.711 is the benchmark standard for narrowband telephony. It has been successful for many decades because of its proven voice quality, ubiquity and utility. A new ITU-T recommendation, denoted G.711.0, has been recently established defining a lossless compression for G.711 packet payloads typically found in IP networks. This paper presents a brief overview of technologies employed within the G.711.0 standard and summarizes the compression and complexity results. It is shown that G.711.0 provides greater than 50% average compression in typical service provider environments while keeping low computational complexity for the encoder/decoder pair (1.0 WMOPS average, <;1.7 WMOPS worst case) and low memory footprint (about 5k octets RAM, 5.7k octets ROM, and 3.6k program memory measured in number of basic operators).
international conference on acoustics, speech, and signal processing | 2002
Alan V. McCree; Jacek Stachurski; Takahiro Unno; Erdem Ertan; Erdal Paksoy; Vishu R. Viswanathan; Ari Heikkinen; Anssi Rämö; Sakari Himanen; Peter Blöcher; Oliver Dressler
This paper presents an improved 4 kb/s hybrid MELP/CELP speech coder submitted as a candidate for ITU standardization. The coder uses three modes: a high-quality MELP coder for strongly voiced speech frames, an ACELP coder with pitch prediction for weakly voiced frames, and a stochastic CELP coder for unvoiced frames. We present recent enhancements to this coder, both to improve speech quality and to reduce coder complexity. Previous lTU Selection Testing results on an earlier version of this coder showed that it met nearly all requirements for toll-quality speech, more than any other candidate. Our internal testing shows that the current reduced-complexity fixed-point coder maintains this high performance.
international conference on acoustics, speech, and signal processing | 2010
Jacek Stachurski; Lorin Netsch
The paper describes two lossless coding tools employed in the new ITU-T G.711.0 Recommendation: fractional-bit and value-location encoding. Instead of encoding each sample individually as done in G.711, the fractional-bit coding tool identifies the total number of signal levels that exist within an input frame and then combines several samples for joint encoding with fractional-bits per sample. The value-location tool encodes positions of all values within an input frame that differ from a reference value. The method efficiently represents an input frame as a sum of value-location code vectors that are sequentially encoded using Rice, binary, or explicit location encoding. Presented results illustrate how the described coding techniques were adopted for usage within the new ITU-T G.711.0 standard.
advanced video and signal based surveillance | 2013
Jacek Stachurski; Lorin Netsch; Randy Cole
While video analytics used in surveillance applications performs well in normal conditions, it may not work as accurately under adverse circumstances. Taking advantage of the complementary aspects of video and audio can lead to a more effective analytics framework resulting in increased system robustness. For example, sound scene analysis may indicate potential security risks outside field-of-view, pointing the camera in that direction. This paper presents a robust low-complexity method for two-microphone estimation of sound direction. While the source localization problem has been studied extensively, a reliable low-complexity solution remains elusive. The proposed direction estimation is based on the Generalized Cross-Correlation with Phase Transform (GCC-PHAT) method. The novel aspects of our approach include band-selective processing and inter-frame filtering of the GCC-PHAT objective function prior to peak detection. The audio bandwidth, microphone spacing, angle resolution, processing delay and complexity can all be adjusted depending on the application requirements. The described algorithm can be used in a multi-microphone configuration for spatial sound localization by combining estimates from microphone pairs. It has been implemented as a real-time demo on a modified TI DM8127 IP camera. The default 16 kHz audio sampling frequency requires about 5 MIPS processing power in our fixed-point implementation. The test results show robust sound direction estimation under a variety of background noise conditions.
international conference on acoustics, speech, and signal processing | 2009
Jacek Stachurski
The paper describes an embedded CELP coder in which an adaptive codebook is included in every enhancement layer and the lower-layer codebook gains are re-optimized in the higher layers to further improve speech quality. Each layer maintains its own filter memories to generate required target vectors, adds adaptive and fixed codebook contributions, and re-optimizes all codebook gains to improve coder performance (multi-layer gain optimization). The common elements across the embedded layers include the lower-layer adaptive and fixed codebook entries. The pitch-lag used in the core layer is also re-used in the enhancement layers to maintain time synchronization between layers. Estimation and encoding of selected lower-layer parameters may take into account their estimated impact on the higher layers. The described Embedded CELP coder has been implemented in the Embedded Variable Bit-Rate (EV-VBR) codec standardized by ITU-T as Recommendation G.718. The characterization test results of the G.718 Embedded CELP are summarized.
international conference on acoustics, speech, and signal processing | 2003
Jacek Stachurski; Alan V. McCree; Vishu R. Viswanathan; Ari Heikkinen; Anssi Rämö; Sakari Himanen; Peter Blöcher
This paper describes extensions of the 4 kb/s hybrid MELP/CELP coder, up to 6.4 kb/s and down to 2.4 kb/s. The baseline 4 kb/s coder uses three coding modes: MELP in strongly voiced speech frames, CELP with pitch prediction in weakly voiced frames, and CELP with stochastic excitation in unvoiced frames. To minimize switching artifacts between parametric MELP and waveform CELP coding, an alignment phase is encoded in MELP and zero-phase equalization is applied to the CELP target signal. The 6.4 kb/s extension uses the same three modes as the 4 kb/s coder, with improved MELP and CELP coders. The 2.4 kb/s extension uses only two modes: MELP for voiced frames and CELP synthesis with random excitation for unvoiced frames. The alignment phase is encoded in MELP frames for all bit rates so that time synchrony with input speech is always maintained. Alignment phase and zero-phase equalization enable smooth switching between coders at different bit rates. The hybrid MELP/CELP coding structure leads to coders that perform better at a given bit rate than MELP or CELP separately, and better than or equivalent to higher bit-rate ITU standards. Formal subjective tests show that for all-but-one tested conditions, the 6.4 kb/s hybrid coder is better than 8 kb/s G.729 and the 2.4 kb/s coder is equivalent to, or better than, 6.4 kb/s G.729 Annex D.